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EXECUTIVE  SUMMARY 


Database  management  resulted  from  a  need,  for  data  to  be  retained  in  the  machine  beyond  the 
current  run.  This  need  arose  in  the  early  prehistory  of  computers.  Since  these  earliest  times  of 
computer  interaction,  humans  have  had  difficulty  retrieving  data  they  stored  so  easily.  The 
database  eras  have  stretched  from  secondary  storage  on  cards,  tape,  drum,  and  disk  through 
physical  datat/oses  to  today’s  current  technology  of  the  fully  relational  database  manager  and  logical 
databases  in  the  form  of  relational  views.  The  progress  has  been  slow  and  not  without  difficulty. 
The  motivation  for  this  study  was  to  look  at  the  history  of  database  in  order  to  critique  the 
human-computer  interface  with  databases  and  to  project  the  next  areas  of  research  in  database 
management. 

A  human  has  limited  ability  to  store  the  vast  details  about  the  entities  of  his  world.  He  is  usually 
dealing  with  partial  information.  He  would  like  to  move  from  his  state  of  partial  information  to  a 
state  of  more  complete  information.  Computers  with  their  enormous  capacity  to  store  vast  detail 
arc  a  natural  extension  of  man’s  capability  to  use  machines  to  assist  him  in  this  movement.  But 
because  the  user  begins  with  only  partial  information,  he  often  finds  it  difficult  to  retrieve  the 
information  so  easily  stored  at  the  time  of  initial  data  entry. 

A  chronic  consequence  of  man’s  partial  information  in  interacting  with  databases  is  query  failure. 
Query  failure  occurs  in  various  forms:  structured  database  query  failure,  unstructured  database 
query  failure,  and  natural  language  query  failure.  All  database  query  failure  is  frustrating,  but 
natural  language  failure  can  be  especially  insidious  because  it  can  fail  by  giving  erroneous 
information  to  the  user  and  the  user  may  be  completely  unaware  of  the  failure.  In  fact,  emulating 
human-human  communication  (i.e.  natural  language)  may  be  the  wrong  approach  in  attempting  to 
improve  human-database  communication.  This  is  because  human-human  communication  is  based 
on  far  more  than  verbal  cues;  it  includes  all  the  human  senses  vision,  smell,  touch,  taste  and  sound. 

A  better  approach  may  be  to  use  methods  of  artificial  intelligence  such  as  semantic  networks  and 
object-oriented  programming  to  create  an  information  sublanguage.  This  information  sublanguage 
does  not  require  the  user  to  have  knowledge  of  database  meta-data.  It  has  adaptive  methods, 
flexible  interaction,  conceptual  partem  matching  and  disambiguation  methods.  This  paper  shows 
how  all  of  these  methods  must  be  combined  into  a  new  (i.e.  beyond  relational)  hybrid  database  to 
strengthen  the  underlying  structure  of  databases  before  communication  with  them  can  be  improved. 
This  paper  suggests  research  in  three  directions  in  order  to  achieve  the  strengthenej  database 
structure:  (1)  building  on  existing  foundations,  relational  databases;  (2)  building  new  foundations, 
value  based  semantic  networks;  and  (3)  a  new  interface  paradigm,  visual/graphic  th  ,sauri. 
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AN  EVOLUTIONARY  STEP  TOWARD 
MORE  EFFECTIVE  HUMAN-DATABASE  INTERACTION 


1.  INTRODUCTION 

Natural  language  comprehension  has  proven  to  be  an  enormously  complex  task.  It  is  still  far  off 
as  a  day-to-day  tool  for  effective  human-database  interaction.  The  need  for  a  more  human  style  of 
human-computer  communication  will  not  wait  for  some  distant  future  development.  Although 
interactive  query  and  command  languages  have  proven  vastly  superior  to  batch  oriented,  procedural 
programming  languages  in  terms  of  programmer  productivity,  the  barrier  between  man  and  his 
information  stores  remains  largely  unscathed. 

What  is  needed  is  an  incremental  step,  an  evolutionary  development  to  fill  the  chasm  between  the 
existing  query  languages  of  today  and  the  natural,  human  communication  with  computers  promised 
for  a  distant  tomorrow.  This  step  would  properly  be  filled  with  a  cooperative,  flexible,  and  more 
expressive  hybrid  of  Data  Sublanguages  (such  as  SQL),  Object-Oriented  encapsulation  from 
programming  languages,  semantic  networks  from  artificial  intelligence,  and  visual/graphical 
interface  from  man-machine  studies.  Where  data  sublanguages  require  precise  syntax  and  exact 
matches  to  both  the  structure  and  content  of  the  information  store,  a  hybrid  sublanguage  would  be 
more  forgiving  and  helpful.  The  data  sublanguage  user  must  think  in  terms  of  logical  data 
structures  while  a  hybrid  sublanguage  user  thinks  in  terms  of  the  objects  and  concepts  of  their 
domain  of  interest.  The  hybrid  sublanguage  is  the  means  for  an  information  systems  user  to 
effectively  and  efficiently  navigate  and  cultivate  a  dynamic  information  terrain  with  which  he  is  not 
entirely  familiar.  It  is  not  intended  to  convince  a  user  that  he  is  conversing  with  a  computer-based 
agent  that  understands  the  world  in  a  human  fashion. 

The  report  begins  with  a  look  into  the  history  and  context  of  these  proposed  developments  by 
analyzing  the  evolution  of  database  through  the  current  era.  Then  the  problems  that  plague 
human  -  database  interaction  are  analyzed  and  potential  solutions  with  their  limitations  are 
considered.  Lastly,  a  framework  of  research  and  development  is  sketched  out  to  move  into  the 
next  era  of  database. 

The  study  is  flavored  with  several  methodological  principles: 

examination  of  technologies  and  research  traditionally  inside  the  database  area, 

examination  of  technologies  and  research  traditionally  outside  the  database  area. 

recognition  that  human-database  interaction  runs  deeper  that  the  human-computer 
interface  and  critically  involves  data  representation  and  database  design 
methodologies,  and 

avoidance  of  a  single  focused  orientation  (e.g.,  domain  oriented,  user  centered, 
machine  based). 

Today’s  databases  are  the  product  of  decades  of  evolution.  New  developments  must  form  a  natural 
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extension  of  this  progression  if  they  are  to  be  easily  incorporated  into  mainstream  computing.  This 
report  is  addressed  to  those  groups  and  individuals  having  any  of  the  following 
goals/agendas/responsibilities: 

transforming  current  databases  with  the  next  generation  of  technology, 

solving  access  problems  with  existing  databases, 

developing  database  applications  over  the  next  decade, 

developing  database  management  systems, 

planning  systems  with  a  mission  critical  database  component,  and 

conducting  research  in  the  database  field. 
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2.  HISTORICAL  CONTEXT 


Evolution  of  Database 

The  evolution  of  database  can  be  illustrated  with  a  time  line  from  prehistory  through  the  future. 
Prehistory  is  essentially  the  1920’s,  30’s  and  40’s  -  out  of  which  computer  science  as  a  study  and  a 
professional  discipline  arose.  Database  is  a  research  area  that  emerged  from  the  prehistory  along 
with  artificial  intelligence,  programming  languages,  software  engineering,  and  other  research  areas 
of  computer  science. 

As  shown  in  Figure  1,  the  various  eras  on  the  database  historical  time  line  are:  secondary  storage 
era,  physical  database  era,  logical  database  era  (relational  database  era),  and  the  future. 

Most  people,  whether  they  are  from  the  research  community,  academia,  the  corporate  world,  or  the 
press,  think  that  the  future  of  human  interaction  with  database  should  mimic  the  way  we  interact 
with  each  other  as  humans.  And  that  generally  means  understanding  spoken  and  written 
communications  in  a  "natural  language  interface."  If  a  human  wants  to  know  something  from  a 
database,  he  simply  asks  it  like  he  would  another  human  (in  natural  language,  whether  typing  or 
talking  )  and  the  database  will  respond  with  the  requested  information.  This  study  will  examine 
this  premise  and  show  why  it  is  unlikely  to  ever  be  achieved,  at  least  under  the  current  state  of 
database  theory. 

During  prehistory,  database  and  computer  science  were  in  their  infancy.  Computers  filled  large 
rooms  and  were  made  cf  vacuum  tubes,  not  semiconductors.  But  the  most  interesting  aspects  of 
prehistory  were  the  users.  Computers,  as  archaic  as  they  were,  were  considered  to  be  very  friendly 
by  the  users,  even  though  programming  was  generally  in  binary. 

The  difference  though,  compared  to  today,  is  that  typical  users  back  then  were  John  Von  Neumann, 
father  of  quantum  mechanics;  Claude  Shannon,  father  of  information  theory;  and  Norbert  Weiner, 
father  of  cybernetics.  Typical  users  were  highly  educated  people,  often  Nobel  laureates,  and  they 
had  no  problems  interfacing  with  their  devices.  Plus  the  fact  that  the  tasks  they  were  using  the 
machines  for  were  fairly  low  level  and  straight  forward.  They  were  not  trying  to  find  out 
sophisticated  conclusions  from  an  accounting  system,  but  rather  just  trying  to  find  the  results  of  a 
specific  equation  evaluated  over  a  certain  numerical  range. 

Out  of  this  prehistory  (see  Figure  l),  it  was  determined  at  some  point  that  there  was  a  need  for 
reference  to  data  beyond  that  which  was  stored  in  memory  of  the  computer.  As  data  sets  grew  in 
size,  a  r.eed  developed  for  storing  data  when  the  power  was  off  to  avoid  re-entering  data  for  each 
run  of  the  calculations.  External  physical  devices  were  developed  such  as  punched  tape  and 
Hollerith  cards,  magnetic  tape  and,  eventually  the  most  important  device  to  the  user  in  database, 
physical  disk  drives  in  which  data  could  be  accessed  randomly.  The  tremendous  advantage  of  the 
disk  drive  was  that  the  database  did  not  have  to  be  read  sequentially  from  the  top  to  find  the 
required  data,  rewound  and  the  cycle  repeated  as  in  other  devices  (e.g.  tape  storage.)  That  was  the 
beginning  of  the  first  real  era  of  database,  the  era  of  secondary  storage,  where  secondary  storage  is 
defuied  as  anything  from  punched  cards  to  on-line,  random  access  devices.  This  point  is  chosen  as 
the  beginning  of  database  because  it  was  the  first  time  functional  users  began  to  gain  access  to 
data  generally  throughout  the  organization. 


1  Superscript  numbers  refer  to  the  list  of  references  in  10.  REFERENCES. 
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Time  Line 


Figure  1  -  Database  Evolution 


The  problem  with  pure  secondary  storage  devices  was  if  the  user  wanted  to  access  data  on  the 
physical  storage  device  he  needed  to  know  exactly  where  it  was  located  on  the  physical  device.  He 
had  to  refer  to  its  physical  location:  what  track  it  was  on,  what  sector,  how  many  inches  from  the 
beginning  if  it  was  on  a  magnetic  tape,  or  where  it  was  in  a  stack  of  punched  cards. 

The  next  stage  in  the  development  of  database  grew  out  of  the  strong  desire  to  refer  to  data 
independent  of  the  location  on  a  physical  device.  And  it  turns  out  the  secondary  storage  era  sowed 
the  seeds  of  its  own  destruction  because  someone  said,  "Why  must  we  address  a  physical  device, 
why  can’t  we  have  a  file  that  is  logical?  Rather  than  knowing  where  our  data  is,  why  can’t  we  just 
name  the  collection  of  data  we  desire  and  let  some  software  find  it  physically  and  bring  it  back  to 
us?"  The  era  of  secondary  storage  gave  birth  to  something  new:  the  concept  of  a  file. 

A  file  is  a  collection  of  data  whose  physical  location  on  secondary  storage  is  transparent  to  the 
user.  The  physical  location,  retrieval,  and  other  operations  on  the  collection  of  data  are  managed 
by  software  written  especially  for  this  purpose  and  which  is  tightly  coupled  to  the  disk  and 
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computer  hardware.  As  .m  example,  a  text  file  may  be  relocated  from  the  outside  tracks  of  a  fixed 
disk  drive  to  the  inside  tucks  by  a  backup  and  restore  procedure.  Indeed  at  certain  times  in  the 
history  of  a  file  it  may  not  all  be  physically  located  in  adjacent  storage  areas.  The  physical 
location  of  a  file  has  no  effect  when  a  word  processor  is  instructed  to  load  the  file  into  memory  so 
that  it  can  be  edited. 

Although  the  file  concept  makes  physical  location  of  the  file  on  secondary  storage  transparent, 
locality  of  the  data  within  the  file  remains  as  a  problem  to  the  user.  That  is,  the  data  collected 
within  the  file  is  structured  to  some  extent  (the  minimal  structuring  is  sequential,  one  data  item 
followed  by  another).  The  need  to  know  where  data  resides  on  a  physical  device  is  replaced  by 
the  need  to  know  its  logical  position  within  the  file.  Physical  navigation  is  replaced  by  logical 
navigation,  fu  find  the  data  sought  may  require  positioning  a  pointer  at  the  logical  beginning  of 
the  file  and  sequentially  navigating  through  it  until  the  desired  data  are  encountered. 

The  amount  of  structure  of  a  file  varies  across  a  continuum  -  from  a  simple  sequence  of  characters 
as  in  the  most  basic  text  file  (relatively  unstructured)  to  the  opposite  extreme,  as  in  a  file  of 
identically  structured  records,  each  record  consisting  of  predefined  fields  of  data,  some  numeric, 
some  alpha,  others  binary  (highly  structured).  Figure  2  below  conveys  this  general  difference.1 
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Figure  2  -  Unstructured  vs.  Structureu  Database 


The  polygon  on  the  left  is  not  the  blob,  even  though  it  has  been  known  to  eat  up  large  sums  of 
money,  time  and  patience.  It  represents  an  unstructured  database.  On  the  right  is  a  structured 
database,  depicted  by  a  table  of  records  and  fields.  That  is  the  contrast  between  structured  and 
unstructured  file  storage.  Structured  files  are  files  where  the  meta-data  of  the  data  model  must  be 
predefined  and  the  data  filtered  into  it.  Unstructured  files  have  a  mass  of  text  or  numbers  or 
statistics  and  it  must  be  navigated  with  little  or  no  guidance.  Sometimes  unstructured  files  are  a 
lot  richer.5 

An  example  of  this  is  a  literature  collection  containing  articles  of  professional  interest.  This 
collection  is  an  extremely  rich  source  of  information;  but  one  pays  the  price  for  having  richness  in 
the  ability  to  search  the  database  and  retrieve  desired  information.  Because  both  storage  and 
retrieval  of  the  information  rely  on  the  structure  of  English  and  the  knowledge  humans  have  in 
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their  brains  when  they  read  it,  finding  requested  information  is  difficult  and  slow.  The  database 
needs  a  human  being  to  understand  it. 

On  the  other  hand  ,  structured  data  have  been  filtered  and  parsed  into  various  meaningful  units, 
fields.  The  structure  is  so  finely  broken  up  it  has  only  limited  meaning  and  this  meaning  is  well 
defined.  Precisely  because  structured  data  has  been  designed  with  limited  meaning,  machine 
manipulation  and  interpretation  are  feasible.4 

As  the  file  concept  matured,  files  were  grouped  together  and  put  into  an  organizational  structure. 
So  files  grew  up,  so  to  speak,  and  matured  into  the  concept  of  a  physical  database  containing 
several  related  files  linked  together.3  As  an  example,  take  a  file  of  departments  and  a  file  of 
employees.  These  two  files  might  be  put  into  a  hierarchical  database  with  pointers  from  one  file  to 
the  other.  The  file  for  department  has  information  such  as  the  department  number,  the  department 
name,  and  an  employee  number  for  the  person  who  is  the  manager.  It  would  also  have  pointers  to 
records  of  information  in  the  employees  file.  Each  of  these  records  would  be  identically  structured 
with  an  employee  number,  a  last  name,  first  name,  address,  etc.  Access  to  all  of  those  individual 
employees  cannot  be  gained  directly  through  the  employees  file.  The  department  file  must  first  be 
traversed  to  locate  a  department,  then  follow  its  pointers  down  to  find  out  what  employees  were 
there.  In  other  words,  an  employee  could  not  be  accessed  directly,  but  only  through  his 
department.  The  physical  access  path  had  to  be  followed.  The  paths  were  physical  because, 
although  the  pointers  were  unseen,  they  were  located  in  one  file  and  pointing  to  locations  in  other 
files.  It  was  necessary  to  navigate  both  between  files  and  within  files  to  locate  data.  Simple 
requests  for  data  required  lengthy  programs  for  retrieval  of  the  data. 

Physical  databases  had  offspring  too.  Physical  databases  gave  rise  to  the  notion  of  a  logical 
database.  The  motivation  for  logical  databases  was  to  gain  access  to  data  without  explicitly 
following  the  physical  path.  In  the  context  of  the  employee  example,  why  can’t  the  employee  data 
be  accessed  without  passing  through  the  departments  file?  Ted  Codd,  father  of  the  relational 
database  model,  developed  the  notion  of  a  logical  database  and  developed  the  theory  of  logical 
databases  on  a  rigorous  mathematical  foundation.  In  a  relational  database,  software  handles  the 
access  paths  and  navigation,  almost  transparently  to  the  user,  but  problems  still  exist.  For  example, 
the  user  must  have  knowledge  of  the  structure  definition  of  the  database  (the  meta-data)  in  order  to 
effectively  use  the  database. 


Logical  database  is  maturing  too.  In  1988,  fully  relational  databases  are  appearing  in  the 
marketplace.  As  logical  database  is  maturing,  the  question  is,  "What  kind  of  offspring  is  it  going  to 
give  rise  to?"  Actually,  that’s  a  question  of  "What  do  we  want  it  to  give  rise  to?"  So,  there  are 
some  question  marks.  What’s  going  to  be  the  new  infant?  What’s  it  going  to  mature  into  when  it 
gets  a  little  older?  In  the  sections  that  follow,  these  questions  will  be  addressed  and  a  radically 
new  database  model  will  be  proposed  that  allows  for  innovative  human-database  interaction. 


Evolution  of  Usability 

This  evolution,  if  examined,  has  not  really  given  rise  to  more  powerful  or  faster  tools.  Today, 
operations  with  the  latest  fifth  generation  database  management  system  are  not  going  to  be  any 
faster  than  if  the  old  software  was  running  on  the  latest  hardware.  Performance  will  not  be 
significantly  less  than  the  latest  fifth  generation  software.  In  fact,  because  the  old  generation 
software  was  finely  tuned  to  the  application  by  clever  programmers,  more  performance  is  likely. 
So  the  evolution  of  database  does  not  represent  an  evolution  of  greater  power  or  speed.  The 
fascinating  aspect  is  that  it  represents  an  evolution  of  usability  and  more  effective  user  interaction.7 

In  the  secondary  storage  era,  databases  were  addressed  at  a  very  low  level  and  it  took  someone 
like  a  Nobel  laureate,  or  someone  very  intelligent  and  very  capable,  to  do  it  (over  long  periods  of 
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time).  Moving  to  the  era  of  files  and  physical  databases,  it  became  much  easier.  Programmers 
could  access  data.  They  did  require  training  and  had  to  be  expert  at  what  they  were  doing.  But 
they  didn’t  have  to  be  a  Nobel  laureate. 

And  now  as  logical  database/relational  era  evolves,  people  claim  that  anybody  can  access  data.  Of 
course,  that's  not  quite  true  yet.  Limited  natural  language  interfaces  -  Natural  Language  Interface, 
Intellect,  Clout  -  help  but  they  have  serious  inherent  deficiencies.  They  are  making  it  easier  and 
further  removed  from  the  physical  databases  underlie  ir  alL  It  is  important  to  note  that  this 
evolution  of  database  is  really  the  evolution  of  human  computer  interaction  to  provide  better  and 
more  effective  interaction,  not  speed  and  power  from  a  software  standpoint.1 


Evolution  of  Non-Database  Areas 

As  noted  previously,  prehistory  spawned  many  areas  of  computer  science  research.  Examples 
include:  artificial  intelligence,  software  engineering,  programming  languages,  and  many  more.  In 
early  database  development,  these  disciplines  had  no  impact  on  the  database  evolution.  These 
disciplines  are  now  impacting  database  development  as  computer  science  it  self  emerges  as  a 
mature  discipline.  In  addition  there  are  other  scientific  disciplines  outside  computer  science  which 
are  impacting  database  design.  These  include  design,  development,  and  management 
methodologies  from  the  engineering  world,  theories  from  cognitive  science,  and  requirements  from 
application  areas  (e.g.  iterative  design  methods.)’ 

Programming  languages  followed  a  similar  and  converging  evolution.  They  started  at  the  physical 
level  where  the  user  physically  programmed  individual  logic  circuits.  They  moved  up  to  the  level 
of  assembly  language  where  machine  instructions  were  addressed  in  a  specific  but  still  very  low 
level  programming  language.  They  then  moved  up  to  third  generation  programming  languages 
such  as  "C",  Basic,  Fortran  and  Cobol,  where  the  programmer  dealt  with  a  logical  machine,  not  the 
real  machine,  making  programs  more  easily  understood.  Now,  database  and  programming 
languages  are  merging  into  fourth  and  fifth  generation  languages  which  are  even  further  removed 
from  the  machine.  That  course  is:  let’s  get  further  and  further  from  the  real  hardware  and  let  us 
interact  instead  with  a  virtual  machine.  Although  database  has  evolved  in  near  isolation  from  the 
other  disciplines,  continued  evolution  should  be  far  more  integrated  with  the  rest  of  the  world. 
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3.  THE  DATABASE  PROBLEM 


Query  Failure 

It's  important  to  consider  the  purpose  of  databases  and  why  humans  want  to  interact  with  them. 
They  know  something  the  human  doesn’t  know,  or  he  once  knew  and  forgot.  Humans  have  limited 
ability  to  store  vast  specific  detail  about  entities  of  his  world.  Therefore  he  is  almost  always 
working  with  partial  information.  He  would  like  to  get  from  a  state  of  having  partial  information 
to  having  full  information.  But  because  he  has  only  partial  information  it’s  not  easy  to  pinpoint 
where  the  information  is  and  how  to  get  to  it.  This  is  the  problem.  Before  something  can  be 
found,  it  must  already  be  known.  A  chronic  consequence  of  the  problem  is  query  failure.  Three 
examples  of  query  failure  will  be  examined:  structured  data,  unstructured  data,  and  natural  language 
access  methods.10 

Structured  database  query  failure 

Looking  first  at  a  structured  query  example  using  an  employees  table  and  structured  query 
language  (SQL).  The  database  has  an  employees  table  with  columns  as  shown  in  Figure  3: 


EMPLOYEES 

EmpNum 

Salary  Age 

DepartmentNum  LastName 

FirstName 

Figure  3  -  Employees  Table 


Assume  the  user  wants  to  find  all  the  employees  aged  25  years  or  younger  in  Department  2  that 
earn  more  than  $35,000  a  year.  Write  the  query  as  shown  below:  (No  particular  database  language 
is  used) 


SQL>  Select  EmpNum,  LastName  from  employees 
where  salary  >  35000 
and  age  <  *  25 
and  DepartmentNum  =  2; 


What  if  the  system  responds  as  follows: 

0  rows  selected 
SQL> 
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What  does  that  mean?  Well,  obviously  it  means  theie  are  no  employees  in  Department  2  whose 
salaries  are  greater  than  $35,000  and  25  years  old  or  younger.  But  the  real  question  is  how  to 
interpret  the  negative  response.  Does  it  mean  that  no  one  less  than  25  years  of  age  has  been  able 
to  be  promoted  rapidly  enough  in  order  to  earn  more  than  $35,000?  A  second  query  is  formulated 
by  revising  the  age  limitation  upward  and  leaving  the  rest  of  the  query  alone.  This  is  a  standard 
strategy  in  solving  query  failure',  change  one  element  of  the  query  and  see  what  happens.  The 
second  version  of  the  query  is: 

SQL>  ....age  <  30  years  old 

Once  again  a  negative  response  is  obtained: 

0  records  selected 
SQL> 

Try  reducing  the  salary  limitation: 

SQL>  ...salary  >  30000  and  age  <  30 

Yet  again: 

0  records  selected 
SQL> 

Variations  of  the  query  could  be  continued  until  exhaustion  of  the  user.  He  eventually  may  give 
up.  Or,  he  may  finally  discover  there  is  no  Department  2.  He  could  have  queried  on  Department 
2  forever  and  never  found  a  positive  response.  Or  he  might  find  that  Department  2  was  Sales  and 
all  employees  there  are  on  strict  commission.  They  don’t  earn  a  salary.  Unfortunately  the  problem 
would  not  be  identified  by  the  database  for  the  user.  This  query,  as  so  many  queries,  has  failed. 

In  the  case  above,  if  the  user  is  a  naive  user  and  doesn’t  know  too  much  about  the  company  and 
the  data,  he  will  have  to  do  a  lot  of  work  to  overcome  the  query  failure  -  even  with  just  one  table 
of  data,  not  hundreds  or  thousands  of  tables.  To  move  between  a  state  of  ignorance  to  some  state 
of  knowledge,  from  having  partial  information  to  complete  information  additional  information  is 
required  (e.g.  knowledge  of  meta-data.)  Query  failure  happens  because  the  database  system  has 
not  helped  you  in  progressing  from  partial  information  to  complete  information. 

The  database  is  not  acting  in  a  cooperative  or  adaptive  manner.  It  insists  the  user  know  in 
advance  the  information  he  wants  to  find  and  the  manner  in  which  the  database  knows  it.  Only 
then  can  a  user  ask  the  database  for  data  and  obtain  positive  results  right  away.  That  defeats  the 
purpose  of  the  database  and  places  barriers  between  many  users  and  their  data.  Even  in  the  era  of 
the  fully  relational  database,  which  is  just  dawning  even  now,  query  failure  is  a  significant 
problem. 

linstryctucsd.  database  query  .failure 

Medline  is  a  very  large  unstructured  database  which  has  medical  information  for  doctors.  This  is  a 
very  large  database  and  browsing  is  virtually  impossible.  It  would  be  as  easy  to  walk  around  a 
large  university  library  browsing  to  find  the  book  you  want  if  the  books  were  all  in  a  heap  on  the 
floor.  Not  very  helpful. 

A  study  was  done  in  which  queries  were  issued  to  the  Medline  system.  A  doctor,  untrained  in 
Medline,  and  a  librarian,  trained  in  the  on-line  information  system  for  Medline,  were  utilized  in  the 
study.  The  doctor  told  the  librarian  what  he  wanted,  the  librarian  formulated  queries  to  the  system. 
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In  querying  the  system,  only  20%  of  what  was  actually  in  the  database  and  relevant  to  the  query 
was  returned.  In  addition,  20%  of  the  responses  were  not  on  target  with  the  query.  The  result  is 
four  times  as  much  information  remains  in  the  database  as  was  returned.  Given  that  only  80%  of 
the  return  was  useful,  only  16%  of  the  useful  information  was  found.  That’s  another  instance  of 
query  failure." 

Now  you  can  ask  why  does  it  happen?  Well,  first  you  do  not  have  a  human  being  browsing  the 
entire  database,  reading  each  journal  to  answer  a  particular  query.  The  database  had  to  be  indexed 
in  some  fashion.  Perhaps  through  keywords  or  abstracts.  The  people  who  made  those  abstracts  or 
defined  those  keywords  did  not  have  this  particular  doctor’s  query  in  mind.  And  perhaps  additional 
knowledge  came  to  light  after  the  journals  and  periodicals  were  indexed.  And  as  a  result,  the 
system  only  found  a  small  fraction  of  the  information  that  was  available.  Is  this  important?  The 
reason  for  the  existence  of  databases  is  to  answer  queries.  If  they  can’t  do  it  they  are  not  serving 
their  purpose  so  it  certainly  is  significant  and  important. 


Natural  language  ouerv  failure 

Can  natural  language  help  here?  The  authors  attended  a  demonstration  of  a  recent  natural  language 
product  currently  being  sold  as  a  front  end  to  the  Ingress  database  manager.  The  name  of  the 
product  was  Natural  Language  Interface  (NLI.)  A  database  in  Ingres  that  had  employees, 
employee  numbers,  their  salaries  and  some  other  information  in  a  table  was  used  in  the 
demonstration.  A  query  was  issued  in  natural  language  using  NLI.  The  query  was: 

"What  is  the  average  salary  for  employees  in  the  shipping  department?" 

The  response  to  the  query  was: 

"Average  Salary  =  $25,000". 

But  several  questions  immediately  come  to  mind  when  an  average  salary  is  computed  (not  to 
mention  the  harassment  of  typing  a  lengthy  query.)  For  example:  what  was  the  denominator  one 
divided  by  to  get  the  average?  That  is,  how  many  employees  were  used  to  compute  the  average? 
What  was  the  range  of  the  salaries  used  in  the  computation?  Was  it  one  very  small  salary  and  one 
very  large  salary  which  gives  a  totally  meaningless  average?  Or  was  it  35  different  individuals 
with  salaries  very  closely  clustered  together?  Different  interpretations  of  the  average  salary  are 
inferred  based  on  the  dispersion  and  the  number  of  the  salaries  used  for  computation. 

When  the  queries  suggested  above  were  made  into  the  database,  multiple  and  different  responses 
were  returned.  Ultimately,  it  was  discovered  that  the  average  given  in  the  original  response  was 
not  even  correct!  The  query  failed  because  of  ambiguity.  There  was  no  indication  of  a  problem 
when  the  original  response  was  given.  The  user  had  no  indication  the  query  had  failed  but  he  had 
been  given  incorrect  information.  This  is  an  example  of  an  extremely  dangerous  and  insidious 
form  of  query  failure.  Instead  of  the  database  saying  it’s  unable  to  answer,  it  answers  with 
erroneous  information.  And  the  user  was  given  no  indication  of  a.  problem  with  the  query! 

The  queries  continued  into  this  same  database.  The  example  question: 

NLI>  "Is  Jeremy  rich?" 

The  system  responded, 

NLI>  "Jeremy  is  rich." 
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But  next,  the  query  was  made  with  "rich"  having  the  double  meaning  of  the  name  of  an  entity  and 
the  concept  of  wealth.  How  does  one  interpret  this  inside  the  database  context?  As  a  rule,  natural 
language  systems  can  not  assign  multiple  meanings  to  the  same  term.  And,  therefore,  the  query: 

NLI>  "Is  Rich  rich?" 

failed.  NLI  couldn’t  handle  the  multiple  meanings  inherent  in  homonyms  (words  that  sound  alike 
but  have  different  meanings)  even  though  the  database  did  have  information  on  salary  for  all  the 
employees,  including  Rich.  The  system  would  not  allow  assignment  of  a  wealth  meaning  to  "rich" 
(salary  greater  than  $50,000  a  year)  and  simultaneously  the  meaning  of  an  entity  contained  in  the 
database  (the  individual  named  Rich).  This  was  a  significant  problem  with  NLI  and  the  system 
gave  no  help  at  all  when  it  failed.12 


Adaptive  methods 

What  technologies,  methods,  philosophies,  etc.  are  available  for  addressing  these  database 
problems?  What  is  needed  is  an  adaptive  component  in  the  database.  Adaptive  in  that  it  can  take 
what  the  user  knows  about  the  database,  its  structure  and  content  and  match  to  what  the  database 
knows  and  return  the  best  possible  answers.  There  are  a  number  of  adaptive  methods  available 
and  these  are  discussed  in  the  following  paragraphs.11 


Panem-matching 

At  the  lowest  level  is  character  pattern  matching.  One  possible  source  of  query  failure  occurs 
when  a  typographical  error  occurs  in  the  quep'-  The  database  query  parser  doesn’t  understand  the 
query.  If  there  is  a  character  pattern  matching  component  in  the  database  query  parser  it  could 
give  provide  possible  alternatives.  For  example,  suppose  a  query  into  a  database  of  cars  is: 

I  am  looking  for  a  porsh. 

The  database  should  respond  with: 

I  don’t  understand  porsh.  Do  you  mean  a  Porshe? 

If  it  can’t  do  that,  it’s  not  helping  the  user  get  from  his  state  of  partial  information  to  a  state  of 
more  complete  information.  Spelling  checkers  are  generally  available  today  and  adaptive  pattern 
matching  algorithms  exist  which  can  be  used  to  give  spelling  support.  Paradox  from  Ansa 
Software,  for  example,  provides  some  of  this  kind  of  assistance. 

Disambiguation 

A  step  up  the  adaptive  methods  ladder  from  character  pattern  matching  is  disambiguation. 
Reconsider  the  query 

Is  Rich  rich? 

The  system  must  disambiguate  the  double  meaning  of  "rich"  in  the  query.  The  first  use  is  as  the 
name  of  an  employee  in  the  context  of  the  employees  table  and  the  second  is  a  question  about  the 
wealth  of  the  employee  Rich.  This  is  much  more  complex  effort  because  not  only  does  the 
program  code  need  to  disambiguate  the  meaning  of  the  terms,  it  must  have  information  and 
knowledge  about  what  they  could  mean  in  different  contexts.  This  implies  the  existence  of  a 
sophisticated  thesaurus  and  algorithms  to  do  the  disambiguation. 
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What  does  this  mean  for  a  database?  In  order  to  get  adaptive  interactions  that  include 
disambiguation,  the  structure  of  the  database  needs  to  be  sufficiently  strong  so  that  one  term  can 
have  many  meanings.  And  it  must  have  many  meanings  in  a  form  that  the  database  itself  can  use 
to  resolve  the  ambiguity.  In  other  words,  in  order  to  get  an  interaction  with  the  database  to  be 
more  friendly  and  effective  the  underlying  structure  of  the  database  needs  redesign.  The  existing 
database  design  model  is  not  sufficiently  robust  to  simply  put  a  front  end  processor  on  it  and 
obtain  the  desired  results.1* 

A  lot  of  marketing  people  and  researchers  want  you  to  think  that  you  can  take  an  intelligent 
interface  and  throw  it  on  top  of  a  stupid  database  and  have  an  effective  interaction.  But  effective 
interaction  isn’t  just  intelligent  interface.  It  needs  a  foundation  down  underneath  to  support  it.  The 
failure  of  the  NLJ  query  cited  above  illustrates  that  in  order  to  disambiguate  the  system  needs  a 
sufficiently  strong  foundation  in  the  database  to  express  multiple  meanings  for  terms  so  the 
database  can  do  the  disambiguation. 

Another  new  product  on  the  market  is  Lotus  Agenda.  Lotus  Agenda  lies  somewhere  between 
structured  and  unstructured  in  that  the  user  doesn’t  have  to  predefine  a  structure,  but  Agenda 
internally  creates  something  akin  to  fields  and  records.  It  does  pattern  matching  to  the  extent  that 
if  the  user  is  doing  something  for  Sue  it  will  match  on  the  word  Sue  and  pull  up  everything  it 
knows  about  Sue.  It  does  not  do  disambiguation.  If  there  are  two  people  named  Sue  in  the 
database,  the  last  name  is  required  or  the  query  will  fail.  Or,  worse  yet,  don’t  ask  to  sue  your 
insurance  company!'3 

Relativism 

The  next  area  of  adaptivity  is  relativism,  a  more  abstract  form  of  disambiguation.  More  abstract  in 
that  the  multiple  meanings  are  related.  An  example  of  relativism  is  the  term  marriage.  From 
different  perspectives  the  meaning  is  different  Consider  marriage  from  the  point  of  view  of  the 
catering  company.  To  them  a  marriage  is  an  event  they  have  to  schedule.  In  the  eyes  of  the 
government,  it’s  a  legal  entity  requiring  a  license  and  an  entry  in  the  database  of  vital  statistics. 
And  lastly  consider  a  marriage  in  the  eyes  of  the  husband  and  wife  who  consider  it  a  relationship. 
So  marriage  is  an  event,  an  entity,  and  a  relationship  depending  on  perspective. 

Relativism,  like  homonyms,  must  be  disambiguated.  Relational  database  and  extended  relational 
database  such  as  Codd’s  Relational  Model  Tasmania  (RM/T)  cannot  deal  with  this  problem  Only 
the  most  advanced  artificial  intelligence  formalisms  such  as  semantic  networks  can  represent 
something  like  this.  To  overcome  the  database  problem  where  relativism  is  involved  requires  a 
representational  formalism  that  is  very  robust.  Current  database  models  simply  do  not  have  the 
foundation  to  disambiguate  relativism 


Presuppositions!  analysis 

Presuppositional  analysis  was  created  by  a  number  of  people  at  about  the  same  time  in  the  late 
70s.  It’s  ironic  that  one  of  those  people  later  happened  to  be  the  head  of  Research  and 
Development  for  Lotus  Corporation  while  they  were  developing  Agenda  and  yet  Agenda  has  no 
presuppositional  analysis  in  it.“ 

Presuppositional  analysis  asks  "If  you  say  something  or  ask  a  question  what  does  that  question 
presuppose?".  For  example,  if  the  question 


Which  employees  in  the  company  less  than  25  years  old  earn  more  than  $35,000 
and  are  in  department  2? 
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Almost  unconsciously,  the  user  pre  supposes  a  number  of  things  by  this  query: 


there  is  a  Department  2, 

the  company  has  employees, 

the  employees  in  Department  2  have  earnings,  and 

the  employees  have  ages. 

The  thrust  of  presuppositional  analysis  is  when  a  query  fails,  arrange  the  presuppositions  into  a 
hierarchy.  The  query  is  a  massive,  complex  presupposition  which  can  be  decomposed  into 
components.  For  example,  employees  with  salaries  in  Department  2  pre  supposes  the  existence  of 
a  Department  2,  the  existence  of  employees  and  the  concept  of  salary  (or  possibly  commissions.) 
The  existence  of  Department  2  pre-supposes  the  existence  of  a  company.  In  this  way  a  hierarchy 
is  created  with  each  stage  less  and  less  specific. 

If  the  system  can  use  presuppositional  analysis  it  can  determine  presuppositions  and  when  a  query 
fails  it  can  start  going  down  the  hierarchy  to  find  the  presupposition  that  failed.  Recalling  the 
query  failure  in  the  employee  example  given  above,  presuppositional  analysis  would  determine  the 
faulty  presupposition  of  the  existence  of  a  Department  2.  The  system  could  then  inform  the  user 
of  the  problem  with  the  query  and  even  explain  why  it  failed. 

A  feature  of  presuppositional  analysis  in  its  basic  form  is  that  it  requires  no  special  foundation 
underneath  the  front-end.  It  has  been  implemented  on  top  of  semi-relational  databases.  This 
feature  is  achieved  by  placing  tremendous  demands  on  hardware  processing  power  while  making 
minimal  use  of  the  underlying  database  structure.  Presuppositions  are  identified  and  hierarchically 
ordered  syntactically.  By  parsing  a  query  by  syntax,  presuppositional  analysis  aims  to  roughly 
identify  what’s  wrong  with  the  user’s  query;  it  does  not  address  the  more  positive  problem  of 
assisting  the  user  in  formulating  a  new  and  correct  query.  This  is  not  the  ideal  way  of  helping  a 
user  traverse  the  distance  between  partial  information  and  complete  information. 


Conceptual  panem  processing 

A  step  beyond  presuppositional  analysis  is  proposed  by  the  authors:  conceptual  pattern  processing. 
Conceptual  pattern  processing  does  not  seek  to  inform  the  user  of  errors  wrong  in  his  query  like 
pre  suppositional  analysis  but  rather  it  assumes  the  user  is  asking  for  valid  data  Conceptual 
pattern  processing  therefore  uses  information  in  the  database  to  discover  data  that  best  meets  the 
requested  data.  In  other  words,  it  looks  at  the  syntax  of  the  query,  the  structure  of  the  database 
and  the  semantic  content  of  the  query  simultaneously  to  determine  suggested  lines  of  future 
queries.17 

Rather  than  only  looking  for  the  syntactic  component  of  a  query  that  fails,  a  subregion  of  the 
database  that  comes  closest  to  answering  your  query  is  sought.  How  is  closeness  measured?  By 
examining  the  meaning  of  the  terms  in  the  query  -  the  concepts  the  terms  refer  to  given  the  overall 
context.  Thus  the  analysis  looks  at  conceptual  closeness.1* 

The  result  of  conceptual  pattern  processing  may  not  be  able  to  derive  a  single  closest  solution. 
There  might  be  many  and  it  would  show  all  of  them  and  allow  the  user  examine  the  trade  offs. 
He  would  know  his  options.  Conceptual  pattern  processing  takes  an  optimistic  attitude  rather  than 
a  pessimistic  attitude  toward  query  failure  telling  the  user  about  success  rather  than  failure.  For 
example,  extending  the  earlier  example  of  structured  query  failure  and  considering  the  case  where 
Department  Number  =  2  exists  and  is  sales  (with  sales  people  earning  commission  only),  the 
system,  enhanced  with  conceptual  pattern  processing  might  respond: 
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There  are  no  records  which  meet  your  request  exactly.  You  may  now  - 

(1)  Cancel  your  query 

(2)  Edit  your  query 

(3)  Let  me  examine  the  database  and  identify  rows  that  are  closest  to  meeting 
your  request. 


If  you  now  choose  (3)  and  continue  the  system  might  respond  as: 


I  have  identified  2  distinct  groups  of  rows  from  employees  that 
are  closest  to.  meeting  your  request.  Their  descriptions  foliow. 

Choose  the  group  you  would  like  returned. 

(1)  Select  EmpNum,  LastName 
from  Employees 

where  Salary  >  35000 

and  Age*  =  25 

and  DepartmentNum  <  >  2  ^ 

(2)  Select  Emp  Num,  Last  Name 
from  Sales  people,  Employees 
where  SalesPeople.EmpNum  = 

Employees. EmpNum 

and  Sales  People.Commlssion  >  35000 

and  Age  <  =  25 

and  Department  Num  =  2  ; 


the  user  doesn’t  know  what  trade  offs  he  is  willing  to  make  until  he  knows  what  trade  offs  are 
available.  And  therefore  it  doesn’t  require  prioritization  of  the  query  to  make  one  attribute  more 
important  than  another.  Conceptual  pattern  processing  lets  the  user  know  what  the  choices  are. 

Currently,  conceptual  pattern  processing  requires  a  large  amount  of  intelligence  in  the  system. 
Humans  must  tediously  enter  that  intelligence  into  the  system;  however,  it  does  lead  to  more 
effective  interaction  as  demonstrated  by  the  prototype  system  Proto  Atlas.  Future  developments  in 
neural  network  systems  that  support  self  organization  by  the  system  should  simplify  this 
knowledge  acquisition  problem  and  support  faster  measurements  of  conceptual  closeness. 


Flexibility 

Another  problem  at  the  interaction  level  is  the  problem  of  flexibility  versus  inflexibility.  Database 
systems  require  that  you  specify  references  to  things  you  want  the  way  they  want  you  to.  And 
generally  that’s  textual  by  a  certain  data  type.  In  the  employee  example  given  above,  if  the  user 
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wants  a  certain  employee’s  name  he  may  enter  their  number.  This  number  must  be  specified 
exactly.  If  it  is  specified  with  extra  blanks  or  hyphens  the  system  is  not  going  acknowledge  the 
query  properly.  In  fact,  the  query  probably  won’t  make  it  all  the  way  to  query  failure.  It  won’t 
even  be  processed  as  a  query  because  of  parser  failure.  Beyond  this  the  user  must  tell  the  database 
something  about  the  data  he  is  requesting  in  the  query.  For  example,  he  can’t  just  say: 

Tell  me  everything  about  John  Doe. 

Rather,  he  must  specify  the  fields  requested: 

"Tell  me  the  salary  and  age  of  the  employee  in  the  employee  table  whose  first 
name  is  ’John’  and  whose  last  name  is  ’Doe’". 

The  user  must  know  the  structure  of  the  database.  He  can  get  to  everything  but  he  has  to  know 
the  names  of  the  columns  and  he  has  to  specify  the  data  correctly,  enclose  it  in  quote  marks 
(Single  quotes  not  double  quotes),  spelled  correctly....  It’s  not  flexible.  Additionally,  he  can’t  say: 

Show  me  someone  like  John  Doe  but  who’s  a  little  bit  older. 

The  system  requires  very  formal  communication.  The  user  also  has  to  make  proper  references  by 
direct  identifiers  that  the  system  already  understands  and  has  been  programmed  to  understand.  He 
cannot  use  analogies,  metaphors,  or  descriptions.  On  a  more  difficult  plain,  he  can’t  simply  point 
to  a  physical  object  and  request  the  system  to  tell  him  everything  about  that  object.  He  can’t  use 
any  visual  or  other  sensorial  references  because  the  system  is  primarily  inflexible.  How  can  the 
problem  of  flexibility  be  attacked?  It  requires  underlying  changes  to  the  structure  of  the  database. 
It  requires  a  form  that  is  as  expressive  as  the  world  is  varied.1* 


Adaptive  methods  summary 

Simple  character  pattern  recognition  to  address  spelling  errors  and  basic  pre  suppositional  analysis 
are  the  only  processes  giving  adaptive  and  flexible  behavior  to  a  database  system  that  do  not 
require  substantial  changes  to  the  underlying  structure  of  the  database.  Disambiguation,  relativism 
and  conceptual  pattern  processing  all  require  structural  changes.  Addressing  the  problem  of 
flexibility  also  requires  substantive  changes  in  the  structural  design  of  databases.  The  result  is 
when  it  comes  to  making  databases  more  adaptive  and  flexible  and  to  making  significant  strides  in 
effective  human-database  interaction,  looking  only  at  the  interface  is  insufficient.  There  will  have 
to  be  significant  changes  to  the  underlying  structure  of  the  database  itself.® 

You  can  train  a  monkey  to  be  real  polite  but  he’s  not  going  to  be  very  informative.  You  haven’t 
changed  anything  by  teaching  him  a  friendly  grin.  He  can’t  tell  you  anymore  than  he  knows.  And 
that’s  the  problem  with  the  attitude  that  "We  can  dump  an  intelligent  interface  on  top  of  data  that 
doesn’t  have  very  much  knowledge".11 
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4.  DATA  DICHOTOMIES 


The  data: meta-data  and  the  data-use:data-design  dichotomies  are  major  limitations  of  current 
database  technology  for  developing  effective  user  interaction  interfaces.  These  limitations  are  a 
direct  result  of  the  fact  that  data  models  are  representations  of  data.  The  data  are  representations 
of  the  physical  world.  Therefore  one  is  modeling  a  model  rather  than  modeling  the  real  entity. 
The  relational  model  is  the  relational  model  of  data.  The  user  is  utilizing  sophisticated  query  tools 
but  is  interacting  with  simple,  primitive  data.  The  same  data  that’s  on  the  physical  devices.  The 
user  just  has  a  more  elegant,  higher  level  interface  to  it.  They  are  not  dealing  with  a  high  level  of 
the  world,  they  are  not  dealing  with  concepts.22 

The  database  designer  had  to  come  into  the  company,  look  around,  use  his  mental  faculties  and 
say,  "There  are  employees.  There  are  departments.  There  are  relationships."  He  did  the 
conceptualization.  And  as  a  second  step  he  figured  out  the  entity  types  that  were  necessary  to 
organize  the  data  for  automation.  This  process  is  named  data  modeling.  There  are  people  who 
have  a  high  level  of  expertise  in  data  modeling,  which  includes  designing  schemas.  This  is  not  a 
trivial  task. 21 


Roots  of  Data:Meta-Data  Dichotomy 

The  difference  between  the  structured  and  unstructured  databases  is  quite  important.  A  good 
example  of  the  structured  is  corporate  management  information  systems.  An  unstructured  example 
is  ISARS,  Information  Storage  and  Retrieval  Systems.  A  good  example  of  an  ISAR  is  the  library. 
You  are  storing  documents,  you  want  to  retrieve  documents.  That  is  your  primary  task.  That  is 
what  you  are  interested  in  doing.  How  do  you  do  that? 


Figure  4  -  World  Modeling  vs.  Data  Modeling 


ISARS  is  a  collection  of  documents,  books,  journals,  etc.  Eventually,  the  collection  gets  too  big 
and  overwhelming  to  deal  with  directly.  Simplification  becomes  necessary.  We  need  to  deal  with 
something  similar  but  not  as  hard  to  deal  with.  A  model  of  the  collection  is  created  and  the  model 
is  in  a  form  that  is  much  easier  to  deal  with.  Instead  of  a  big  heavy  book,  there  is  a  small 
synopsis  on  an  index  card.  So  in  a  simple  case,  the  model  of  the  collection  of  documents  might 
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be  a  card  catalog.  If  you  want  to  find  a  book  you  look  through  the  model.  You  manipulate  the 
model.  You  do  a  query  on  the  model  rather  than  going  through  the  entire  collection  of  books. 
You  find  a  reference  to  what  you  want  and  then  you  go  back  to  the  collection  of  books  and  check 
out  the  real  one. 

Modeling  is  a  representation  that  compresses  space  and/or  time.  In  the  card  catalog  a  large  library 
spread  out  over  a  large  space  is  collapsed  and  made  nonlinear.  One  can  browse  through  books  by 
browsing  the  card  catalog.  You  can  go  from  one  document  on  one  side  of  the  library  to  another 
document  on  the  other  side  by  going  from  one  card  to  the  next. 24 

The  truer  the  model  is  to  the  collection,  the  more  you  can  do  in  the  model  and  the  less  you  have 
to  do  in  the  collection.  If  the  model  doesn’t  give  good  descriptions  of  the  books,  you  are  going  to 
have  to  keep  going  back  to  the  real  library  and  read  through  real  books  and  then  go  back  to  the 
model  with  additional  constraints. 

But  what  happened  in  the  structured  database  management  world?  Originally  there  was  a 
collection:  employees,  departments,  customers.  But  that  is  not  what  got  modeled  because  the 
company  started  collecting  data  on  its  employees,  departments  and  customers.  The  data  started 
getting  out  of  hand  and  became  very  difficult  to  deal  with.  Whether  it  was  in  file  cabinets, 
manuals,  old  style  spread  sheets  on  paper,  or  ledger  books. 

Next  came  management  information  systems  and  database  management  which  created  a  data 
model.  The  data  model  is  not  just  one  step  removed  (as  was  the  information  storage  and  retrieval 
system)  from  the  collection,  but  two  steps  removed.  The  data  was  a  representation  for  the  real 
thing,  the  employees,  departments  and  customers.  The  data  model  was  a  model  of  the  data,  not  of 
the  domain.  The  model  of  the  data  is  called  meta  data. 

On  one  hand,  information  storage  and  retrieval  systems  did  something  good.  It  didn’t  introduce 
meta  data.  It  didn’t  introduce  data  into  the  system  then  start  modeling  it.  It  modeled  the  real 
thing.  On  the  other  hand,  it  did  something  bad.  It  hasn’t  progressed  very  far.  It’s  really  still  stuck 
back  in  the  physical  database  era.  It  hasn’t  progressed  into  the  logical  state  yet.  It’s  still  lagging 
behind.  Database  management  systems,  on  the  other  hand,  did  introduce  this  extra  layer  of 
meta  data  and  as  a  result,  more  highly  evolved  technology  of  logical  database  systems  was 
needed.1* 


Problem  of  Data:Meta-Data  Dichotomy 

The  data: meta-data  dichotomy  is  the  distinction  between  the  values  in  the  database  and  the 
structure  of  the  database.  Values  in  the  database  change  often.  The  database  management  system 
makes  it  easy  to  effect  the  changes.  In  contrast,  meta-data  are  very  difficult  to  change.  Is  this 
distinction  important?  Meta-data  concepts  are  a  blessing  and  a  curse.  Its  logical  development 
permits  simple  and  powerful  human-database  interaction.  On  the  other  hand,  because  changing  the 
meta-data  and  the  data  model  of  the  business  enterprise  is  so  difficult,  meta-data  are  revised  only 
under  the  most  extreme  circumstances. 

Consider  the  following  example  of  a  database  management  system.  A  company  determines  it 
needs  a  database  management  system  and  uses  data  modelers  to  design  the  database.  The  model 
designed  by  the  modelers  includes  an  employee  table  in  the  database  with  associated  meta-data 
appropriate  for  the  firm. 

Suppose  the  company’s  circumstances  change  and  it  now  needs  contractors,  but  the  original  data 
model  is  not  designed  to  handle  contractor  personnel.  How  is  this  changed?  It’s  not  like  saying 
John  Doe  was  promoted  so  change  his  department  and  change  his  salary  in  one  place.  Meta  data 
can  be  changed  only  by  unloading  all  the  company  data,  rethinking  the  data  model,  creating  a  new 
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table  type  and  re-populating  it.  And,  of  course,  the  system  will  be  down  while  the  changes  are 
made.  Often  these  types  of  changes  are  postponed  until  a  "convenient  time"  which  never  arrives 
leaving  the  database  in  a  state  of  patches  and  workarounds.26 


Data-u$e:Data-design  Dichotomy 

When  the  distinction  between  meta  data  and  data  was  created,  a  new  distinction  between  the  use 
and  design  of  data  was  also  enforced.  Down  at  the  data  level,  there  are  those  who  use,  manipulate 
and  maintain  the  data:  the  system  users.  And  at  a  higher  level  are  those  who  use,  manipulate  and 
maintain  the  meta  data:  the  data  modelers,  systems  analysts,  database  administrators,  and  data 
administrators.  Most  companies  now  have  a  Chief  Information  Officer  (CIO)  on  a  par  with  the 
Chief  Financial  Officer  (CFO).  A  caste  system  has  been  created  based  on  how  the  database  is 
manipulated.  Some  people  massage  meta  data  while  other  people  massage  data.  They  are  very 
separated,  creating  all  sorts  of  problems  for  both  communities  in  using  the  database.27 

It  must  be  remembered  in  talking  about  human-database  interaction  that  data  modelers  as  well  as 
end  users  are  accessing  the  data.  Not  only  do  end  users  have  difficulty  and  lose  time  in  querying 
the  database,  but  data  modelers,  systems  analysts,  programmers,  etc.  lose  productivity  when  they 
must  struggle  with  knowledge  (or  the  lack  thereof)  of  meta-data.  In  most  instances,  those  talking 
about  natural  language  interfaces  are  referring  to  the  end  users.  Rarely  is  the  developer  included. 
Any  new  paradigm  must  also  address  the  programmer  productivity  problem,  either  from  the 
standpoint  of  making  it  easier  to  access  the  meta-data  or  making  programming  more  efficient.  So, 
far  the  programming  issue  has  barely  been  addressed.  This  paper  offers  some  suggestions  in  this 
regard  in  the  last  section  of  the  paper.3* 
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5.  DATA  INDEPENDENCE 


Returning  to  the  time  line  of  the  evolution  of  database,  Codd’s  main  jump  with  the  Relational 
Model  was  to  go  from  physical  to  logical.  Independence  was  an  important  aspect  of  the  relational 
model.  Physical  independence  means  that  the  physical  location  in  a  database  is  irrelevant  to  the 
user.  Tables  can  be  reorganized  to  reclaim  lost  performance.  It  doesn’t  matter  to  the  user.  Back 
in  the  physical  era  it  did  matter  because  suddenly  users  wouldn’t  find  what  they  were  looking  for 
where  they  expected  to  find  it.  Now  users  are  addressing  a  virtual  database,  a  logical  one.  What 
happens  underneath  it  on  the  physical  level  doesn’t  matter.  But  what  happens  if  the  data  model  is 
revised  to  combine  the  salesmen  and  employees  tables  to  reduce  system  complexity.  Suddenly  all 
the  applications  that  were  written  for  those  specific  tables  don’t  work  because  they  addressed  the 
logical  table.  They  were  dependent  on  what  takes  place  on  the  logical  level  and  now  they  won’t 
work  anymore. 

Codd  added  a  mechanism  called  a  view  definition.  Views  are  virtual  tables.  So  if  you  have  an 
employee’s  table  you  create  a  view  that’s  just  like  it  and  all  your  applications  address  the  view.  If 
suddenly  two  tables  are  compressed  into  one,  two  view  definitions  can  be  created.  One  of  the 
views  would  create  a  table  that  looks  like  the  old  employees  table.  The  other  view  would  create  a 
table  that  looks  like  the  old  salesmen  table.  Both  views  would  come  from  the  new  employees 
table.  The  applications  programs  would  then  run  unaffected.  But  only  if  the  applications  were 
only  doing  queries.  Unfortunately  the  view  definitions  are  only  partial.  It’s  very  hard  to  update 
the  view.  Most  file  maintenance  operations  are  not  permitted  in  views  and,  in  fact,  theoretical 
issues  of  updating  views  have  not  even  been  resolved.  The  primary  problem  is  ambiguity.2* 


Relational  View  Update  Problems 

Maintenance  of  tables  through  views  can  create  problems  of  ambiguity.  Consider  the  following 
example.  Suppose  there  are  two  real  tables  of  departments  and  employees  as  shown  in  Figure  5. 
There  are  several  employees  in  Department  2  but  only  one  in  Department  1.  A  view  is  created 
called  emp_dept  relational  view.  Emp_dept  joins  department  and  employee  tables  showing  which 
employees  are  assigned  to  each  department.  Assume  an  update  is  made  to  the  emp_dept  view  to 
delete  employee  3008  in  Department  1. 

The  following  command  is  issued  to  emp_dept: 

Delete  employee  3008 

What  should  the  view  do?  Obviously  employee  3008  is  deleted  in  the  employee  table?  This  is 
clear.  But  what  happens  in  the  department  table?  Since  employee  was  the  only  employee  in 
Department  1,  is  that  department  also  deleted?  The  situation  is  ambiguous  making  the  table  not 
updatable  or  deletable  through  the  view  Emp_Dept.  So  what  the  view  does  now  is  respond: 

You  can’t  update  me. 

Adaptive  Layer  Around  Tables 

Adaptive  layers  could  be  used  to  resolve  update  ambiguities.  Consider  putting  an  adaptive  layer 
around  relational  database  structures.  Adaptive  mechanisms  should  be  used  to  resolve  update 
ambiguities  on  views.30  Continuing  the  example,  if  there  were  no  specific  rules  in  the  adaptive 
system  it  might  have  to  come  back  to  the  user  and  say: 

I  don’t  know  what  to  do.  Do  you  want  me  to  delete  the  department  or 
not? 
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Figure  5  -  Relational  View  Updates 


But  it  wouldn’t  say: 


You  can’t  update  me. 


Like  it  does  today. 


Or  the  database  developer  could  have  foreseen  such  a  possibility  and  when  they  developed  the 
application  with  this  special  kind  of  view  they  inserted  a  rule  that  said: 

In  the  event  that  one  of  the  rows  in  emp_dept  is  deleted  and  there’s  only  one 
corresponding  row  left  in  departments,  keep  it  but  log  a  message  to  this  effect  and 
send  mail  to  the  appropriate  manager  telling  him  this  department  has  no  employees. 
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The  relational  database  was  supposed  to  have  logical  independence,  but  did  not.  With  adaptive 
layering  around  tables,  technology  can  bring  to  the  relational  database  something  that  it  was 
supposed  to  have  to  begin  with.  In  going  from  the  physical  database  era  to  the  logical,  physical 
independence  was  achieved.  An  adaptive  layer  combined  with  these  special  kinds  of 
disambiguating  views  on  top  of  the  relational  model  would  be  getting  close  to  achieving  logical 
independence.  Something  that  Codd  wanted  to  achieve  but  hasn’t.  Is  this  a  signal  of  a  new  era  on 
the  time  line  of  the  evolution  of  database?  The  time  line  went  from  physical  database  that  had  no 
physical  independence  to  logical  ones  that  had  physical  independence.  Perhaps  this  is  the 
beginnings  of  a  new  database  system,  one  that  has  true  logical  independence. 

Relational  database  developers  still  have  not  achieved  complete  logical  independence.  Consider 
this.  Physical  independence  is  a  two  sided  coin.  One  side  enables  the  DBA  to  reorganize  the 
physical  level  without  impacting  the  users  or  existing  applications.  The  other  side  enables  users  to 
address  the  database  without  knowledge  of  any  physical  details.  Logical  independence  should  also 
have  two  sides.  The  adaptive  views  discussed  above  enable  the  developer  to  reorganize  the  logical 
level  of  the  database  design  without  impacting  users  or  applications  (providing  the  developer  makes 
appropriate  adjustments  to  adaptive  view  definitions).  But  the  second  side  of  logical  independence 
is  absent  -  the  user  must  still  know  the  logical  details  of  the  database  to  use  it  -  its  specific 
structure  and  content.  The  kind  of  adaptivity  we  discussed  first  (e.g.,  conceptual  pattern 
processing)  can  deliver  this  side  of  logical  independence  -  but  only  if  the  database  supports  the 
adaptive  mechanisms  with  thorough  knowledge  of  the  application  domain  and  provides  this 
knowledge  in  a  flexible  form  that  is  useful  to  the  adaptive  mechanisms.  Consideration  will  be 
given  to  what  it  will  take  to  represent  this  real  world  Imowledge  in  the  following  sections  of  the 
paper. 
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6.  MODELING  THE  WORLD 


The  Relational  Model 

How  appropriate  are  current  database  systems  for  modeling  more  than  just  data?  To  answer  this 
we  begin  by  returning  to  relational  database  and  considering  just  what  its  elements  mean.31 


Interpretation  of  the  relational  model 

A  serious  deficiency  in  the  relational  model  is  the  absence  of  a  type  hierarchy.  The  relational 
model  consists  of  just  tables  with  rows  and  columns  with  an  interpretation  given  as  to  what  the 
tables,  rows  and  columns  are.  The  table  as  a  whole  is  considered  an  entity  type.  So  you  might 
have  employee  entity  type  versus  the  department  entity  type  or  some  other  type.  And  some 
entities  can  actually  be  a  relationship  or  an  event.  But  they  are  considered  to  be  entity  types  as 
well.  If  you  happen  to  have  employees  and  there  are  different  kinds  of  employees  -  engineers, 
managers  and  sales  people  -  they  all  have  to  go  in  one  table.  There’s  no  way  of  saying  that 
different  kinds  of  employees  have  special  qualities  about  them.  They  all  go  in  one  place.  Each 
row  is  considered  an  actual  entity  and  columns  are  interpreted  as  properties  or  attributes. 

In  extensions  to  the  relational  model  (such  as  RM/T  for  Relational  ModeVTasmania,  where  Codd 
introduced  it  in  a  database  conference),  it’s  possible  to  create  a  table  with  an  hierarchy  of  types. 
For  example,  there  could  be  an  entity  type  for  employees,  managers,  engineers,  and  sales  people.32 

The  main  employees’  table  could  have  columns  in  it,  the  attributes  of  each  employee.  The  other 
tables  such  as  managers,  engineers,  and  sales  would  be  subtypes  of  the  employee  table.  As  shown 
in  Figure  6,  arrows  drawn  to  employees  from  managers,  engineers  and  sales  show  the  hierarchical 
relationship.  The  subset  mark  denotes  subtype  relationships.  For  example,  managers  are  a  subtype 
of  employee;  or  employees  are  a  supertype  of  managers.  This  is  more  convenient  because  columns 
(or  attributes)  in  the  sales  table  can  be  created  for  commission  or  commission  percent. 


Figure  6  -  Employees:  Subtypes  and  Supertypes 

In  the  managers  table,  columns  for  rank,  or  department  managed  could  be  created.  Thus,  the 
appropriate  attributes  can  be  distributed  only  where  they  are  needed  and  not  where  they  don’t 
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belong  (as  in  tracking  commission  for  managers).  But  there  is  a  problem  with  this  scenario.  A 
different  example  illustrates  this  in  the  following  section. 

Lack  of  true  inheritance  in  the  relational  model 

Inheritance,  as  implemented  in  the  relational  model,  requires  the  repetition  of  values  for  the 
inherited  attributes.  For  example,  consider  the  set  of  tables  shown  in  Figure  7.  There  is  one  table 
for  organisms.  Below  organisms,  there  is  a  subtype  for  animals.  Below  animals,  there  is  a 
subtype  for  primates  and  below  that  a  subtype  for  humans.  Take  a  property  such  as  metabolism. 
It  so  happens  that  every  human  being  has  a  metabolism  based  on  respiration.  That  happens  to  be 
true  of  humans  but  it’s  also  true  of  primates.  And,  in  fact,  it’s  true  of  all  animals.  Although  it’s 
not  true  of  plants  which  use  photosynthesis. 


Figure  7  -  Organisms:  Subtypes  and  Supertypes 


As  shown,  these  have  been  placed  in  a  hierarchy  of  types  and  all  are  tables.  It’s  unfortunate  but, 
for  every  row  in  the  humans  table,  RM/T  requires  a  row  in  primates,  a  row  in  animals,  and  a  row 
in  organisms.  Under  organisms  there’s  a  column  type  named  metabolism  with  the  value 
"respiration"  for  every  row  of  each  animal.  That  value  must  be  repeated  for  every  animal  and 
every  human  being.  The  model  will  not  allow  us  to  designate  that  animals  respirate  in  order  to 
produce  energy  from  food  and  have  that  property  inherited  by  all  its  subtypes.  This  creates  a 
tremendous  amount  of  redundant  information  in  the  tables.” 
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The  rather  rudimentary  method  of  inheritance  illustrated  above  is  ingrained  in  the  relational  model 
Well,  you  can  say,  "So  what?  We’ll  have  the  computer  repeat  the  values  for  us  transparently  so  it 
doesn't  take  up  a  human  beings  time."  There  are  two  problems  with  this. 

Problem  1:  If  we’re  going  to  use  adaptive  mechanisms  we  need  a  more  efficient  form  of 
assigning  properties.  Remember,  we  need  to  have  numerous  terms  that  have  multiple 
meanings  throughout  this  whole  hierarchy. 

Problem  2:  Very  rarely  can  you  classify  things  (that’s  the  goal  here),  without  exceptions 
popping  up,  and  this  is  a  lethal  limitation  of  the  relational  approach. 


Exceptions  are  forbidden  in  the  relational  model 

Consider  another  type  hierarchy  with  grey  things,  elephants,  and  royal  elephants  from  Touretzky* 
As  shown  in  Figure  8,  elephants  are  grey  things  and  royal  elephants  are  elephants.  There’s  a 
problem  though,  royal  elephants  are  not  grey,  royal  elephants  are  white.  Now,  according  to  Codd, 
royal  elephants  would  be  a  subtype  of  elephant  and  elephants  would  be  a  subtype  of  grey  things 
and  Clyde  would  be  represented  by  a  row  in  royal  elephants,  hence  he  must  also  have  a  row  in 
elephants  and  a  row  in  grey  things.  But  Clyde  is  really  a  white  royal  elephant.  So  the  relational 
model  even  in  its  maximally  extended  form  cannot  allow  for  this  exception. 


Figure  8  -  The  Exceptions  Problem  in  Database 
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Another  formalism  is  needed  which  allows  for  typing  and  subtyping  but  doesn’t  require  storage  of 
redundant  information  as  illustrated  in  the  human  being  example  above,  e.g.  every  human  being  in 
an  organism  table  has  to  have  respiration  as  their  digestion  system  and  energy  producing 
mechanism. 

A  completely  different  form  is  needed,  one  that  has  representation  of  exceptions  as  a  simple, 
natural  extension.  Artificial  intelligence  was  mentioned  earlier  as  a  major  topic  of  prehistory  that 
has  developed  along  a  path  independent  of  database  research.  There  are  several  areas  of  artificial 
intelligence  that  can  be  useful  if  applied  to  a  new  model  of  database  management.” 

Semantic  Networks 

Semantic  networks  have  long  been  a  major  topic  in  the  artificial  intelligence  community  as  a 
method  for  showing  type,  supertype  and  subtype  relationships  through  nodes,  links  and  an 
inheritance  mechanism.  The  definition  of  semantic  is  meaningful,  or  having  meaning.  The 
definition  of  network  is  a  certain  mathematical  structure  of  nodes  and  links.  Nodes  are  repositories 
of  an  abstraction  from  the  model  and  links  are  arbitrary  single  line  connections  between  nodes. 
Networks  are  a  very  general  idea  and  there  are  numerous  types. 

The  kind  of  network  used  in  a  semantic  network  is  a  directed  acyclic  graph  (DAG.)  Directed 
means  the  nodes  connected  by  the  links  are  assigned  a  direction.  They  point  from  one  node  to 
another  node  in  a  specific  way  that  remains  unchanged  for  the  life  of  the  model.  Acyclic  means 
there  are  no  closed  paths  in  the  network,  that  is,  you  cannot  start  at  some  node  in  the  network  and 
return  to  the  starting  point  without  lifting  your  pencil  from  the  paper.  A  graph  is  a  very  general 
idea  in  mathematics  which  simply  means  any  structure  formed  by  nodes  and  links  (not  just  the 
familiar  x-y  plots.)  A  DAG  can  have  a  visual  (see  Figure  9)  representation  of  the  nodes  and  links. 
The  visual  representation  is  optional  (but  very  useful  for  human  understanding)  because  the  DAG 
can  be  represented  in  various  mathematical  forms,  e.g.  matrices.  This  last  property  of  DAGs  make 
them  especially  useful  for  modeling,  because  the  mathematical  representations  can  be  manipulated 
by  computers,  hence  their  popularity.” 

Inheritance  in  a  semantic  network  is  defined  in  such  a  way  that  properties  (attributes)  of  a  higher 
level  object  do  not  have  to  be  repeated  in  lower  level  objects.  These  inheritance  characteristics  are 
represented  by  the  directed  links  (also  called  arcs)  between  the  nodes  in  the  network.  Links  are 
given  names  such  as  "is-a"  and  "a  kind  of’  (ako)  showing  the  direction  of  inheritance.  A  detailed 
discussion  of  inheritance  is  beyond  the  scope  of  this  paper. 

Inheritance  with  exceptions 

The  semantic  network  can  represent  inheritance  with  exceptions  as  a  natural  extension  of  its  form. 
Reconsider  the  elephant  example  shown  in  Figure  9  below  as  a  semantic  network  representation. 
There  is  a  node  for  grey  things,  another  one  for  elephants,  and  a  link  denoting  elephants  are  a  kind 
of  (ako)  grey  thing  and  Clyde  is  shown  as  a  kind  of  elephant.  Furthermore,  royal  elephants  are 
shown  which  are  a  kind  of  elephant  and  Clyde  is  shown  as  a  royal  elephant.  Royal  elephants  are 
elephants  but  they  are  not  grey.  The  addition  of  an  "is  not  a"  link  from  Clyde  to  grey  thing  makes 
Clyde  an  exception  to  the  property  grey  thing.  But  how  do  you  tell  which  path  is  the  inheritance 
path?17 

David  Touretzky  developed  the  algorithm  of  inferential  distance  ordering  to  determine  the 
inheritance  path  in  cases  illustrated  by  Figure  9.  For  example,  the  distance  of  inferring  that  Clyde 
is  not  a  grey  thing  along  one  path  is  1.  One  link  away.  The  distance  of  inferring  that  Clyde  is  a 
grey  thing  along  another  inference  path,  is  2  (two  links  away.)  Inheritance  is  defined  along  the 
inference  path  with  the  shortest  inferential  distance  order.  Clyde  is  inferred  not  to  be  grey  because 
that  inference  has  the  shortest  path.” 
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Figure  9  -  Inheritance  with  Exceptions 


There  are  examples,  of  course,  where  the  inference  isn’t  clear.  Consider  the  example  shown  in 
Figure  10.  As  shown  there  is  a  semantic  network  of  quakers,  republicans,  pacifists,  and 
nonpacifists.  Quakers  are  a  kind  of  pacifist,  republicans  are  a  kind  of  nonpacifist.  Nixon  is  a 
republican.  Is  he  pacifist  or  nonpacifist? 

Notice  the  unique  difference  here  between  these  two  examples:  the  elephants  and  Nixon.  In  the 
elephants  example:  if  you  consider  elephants,  there  are  two  paths  you  can  follow  away  from 
elephants  to  things  that  are  more  specific.  You  can  go  from  elephants  to  a  subclass,  royal 
elephants,  or  you  can  go  from  elephants  to  a  specific  instance  of  an  elephant,  Clyde.  But  in  Figure 
10,  Nixon  inherits  properties  from  two  entirely  different  nodes.  When  an  object  inherits  properties 
from  multiple  parents,  it  is  called  multiple  inheritance.  In  this  example,  Nixon  inherits  properties 
from  being  a  republican  and  he  inherits  properties  from  being  a  quaker.  In  other  words,  in 
multiple  inheritance  a  single  node  (Nixon)  has  multiple  parents  (republicans,  quakers.) 

If  the  inferential  distance  ordering  algorithm  is  applied  to  the  example  of  Nixon,  the  same  number 
of  inferential  steps  to  the  inference  that  he  is  both  a  pacifist  and  a  non  pacifist.  In  this  case 
inference  difference  leads  to  ambiguity,  but  at  least  the  network  accurately  determines  the 
ambiguity  and  fails  gracefully.  This  then  defines  a  viable  method  for  handling  exceptions. 
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Inheritance  and  exceptions  handling  in  semantic  networks  are  much  superior  to  the  relational 
model,  because  in  order  to  say  that  everyone  has  respiration  as  their  method  of  producing  energy 
you  only  have  to  attach  it  one  place,  the  highest  category  it  belongs  to.  In  the  organism  example 
that  would  be  to  animal.  Animals  in  general  and  everything  below  animal  life  will  inherit 
respiration  unless  it  is  cancelled  out  by  an  exception.  Another  example  is  birds  that  have  the 
characteristic  that  they  fly.  And  there  is  a  subclass  of  birds,  penguins,  that  do  not  fly.  The  "fly" 
property  is  cancelled  with  a  link  saying:  yes,  they’re  birds  but  they  don’t  fly.  this  exception  can  be 
handled  and  it  can  be  used  in  reasoning. 

Notice  also  something  else.  In  this  scenario  of  semantic  networks  there  is  no  distinction  in  how 
we  represent  data  from  meta-data.  An  instance  like  Nixon  looks  just  the  same  as  a  class  like 
quakers,  republicans,  etc.  That  means  that  the  same  person  uses  the  same  tools  for  working  with 
data  or  meta-data.  There  is  no  distinction.  That’s  a  great  advantage.  It  means  that  if  you  need  to 
change  employees  into  contractors  you  do  it  in  one  fell  swoop.  It  also  means  that  your  whole 
model  can  evolve  slowly  over  time.  It  doesn’t  have  to  be  created  fully  evolved,  providing  all  the 
meta-data  for  the  system  and  then  populating  it  with  data.  It  evolves  naturally. 
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The  problems  of  link  based  relationships 


Link  based  relationships  have  two  problems:  1)  efficient  implementation  and  2)  we  have  created  a 
new  dichotomy.  Properties  and  relationships  are  now  links  while  types  are  nodes.  You  say  "they 
are  different  things,  so  why  can’t  we  express  them  differently?".  Problem:  All  these  nodes  get 
modified  and  classified  by  how  they  fit  in  the  whole  network  structure.  The  meaning  of  elephants 
is  dependent  on  all  of  its  subclasses  such  as  royal  elephants,  Indian  elephants,  or  Canadian 
mountain  elephants.  All  of  its  specific  instances  and  it’s  the  instances  that  define  everything.1* 

You’ve  probably  heard  it  said  that  language  is  entirely  arbitrary.  And  of  course  there’s  a  whole 
bunch  of  philosophers  that  want  you  to  believe  that.  But  it  isn’t  totally  arbitrary.  You  can  use  any 
term  you  want  to  designate  anything  but  at  some  point  your  meaning  eventually  comes  from  the 
real  world,  from  entities  and  actions  and  relationships  and  experiences  in  the  real  world.  The 
measurements  in  the  real  world.  So  every  node  gets  its  meaning.  It  inherits  its  meaning  from  the 
way  it  sits  in  the  whole  network. 

What  about  color?  How  do  we  say  that  color  is  in  fact  a  subtype  of  the  concept  property?  In  fact 
a  measurement  made  on  certain  electromagnetic  radiation  emitted  by  or  reflected  off  an  entity. 
Well,  we  can’t  hook  a  "a  kind  of'  link  up  to  a  color  link  and  hook  it  up  to  a  supertype  property 
link.  The  problem  is  links  don’t  inherit  meaning  through  the  network  the  way  nodes  do.  And  if 
color  is  to  be  treated  like  Nixon,  color  must  sit  in  the  whole  network  structure  and  inherit  its 
meaning  and  definition  from  its  position  in  the  structure.  To  achieve  this  goal,  consider  again  the 
relational  model  to  analyze  one  of  its  strengths.40 


Value  Based  Relationships 

The  genius  and  strength  of  the  relational  model  was  to  allow  the  data  to  drive  the  relationships. 
These  data  are  the  values  in  the  various  attribute  columns.  With  the  relational  model,  there  is  a 
collection  of  tables  linked  through  values.  Unlike  the  earlier  physical  database  era  systems  which 
relied  on  hard  connections  between  different  files,  whether  in  a  hierarchical  order  or  free-form 
network,  relational  table  links  are  not  pre-specified.  With  physical  files,  the  path  to  the  file 
location  had  to  be  known  and  it  had  to  be  specified  in  the  query.  But  it  wasn’t  really  a  query.  It 
was  a  program  in  COBOL.  That’s  why  when  a  report  was  needed,  a  programmer  wrote  a  program 
to  extract  the  necessary  data.  Often  these  programs  created  a  backlog  in  the  data  processing 
department  causing  queries  to  take  a  week  or  more  to  be  returned.  In  the  relational  model,  a  user 
with  knowledge  of  the  meta-data  can  do  equivalent  retrievals  with  a  single  line  command  in  a 
4GL.  The  following  paragraphs  examine  the  nature  of  values  and  how  they  are  used  in  the 
relational  model. 


Values  as  measurements 

A  value  is  data  entered  into  the  attribute  fields  of  a  database  system.  Or  in  another  scenario,  a 
vision  program  might  scan  for  data  for  input  to  an  artificial  intelligence  program  for  interpretation 
in  the  context  of  machine  vision.  Or  chemical  analysis  data  might  be  picked  up  through  real  time 
sensors  as  measurements  for  later  analysis.  Values  can  be  direct  measurements  as  in  real  time 
sensing  or  the  values  may  be  indirect  measurements  as  an  employee’s  salary.41 

Values  that  have  been  measured  in  terms  of  similar  units  may  be  compared  and  the  comparisons 
used  in  establishing  relationships  in  the  data  model.  And  if  they  are  not  similar  units,  but  the  units 
are  known,  a  conversion  between  units  can  be  done  (e.g.  compare  English  pounds  to  American 
dollars  through  the  monetary  exchange  conversion.)  The  values  do  not  have  to  be  the  same 
column  but  they  do  have  to  be  similar  data  types.  One  must  compare  apples  to  apples.  Or  if 
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comparing  apples  to  oranges,  there  needs  to  be  some  means  of  conversion  but  only  if  there  is  some 
metaphysical  similarity.  Even  if  certain  things  are  expressed  in  very  different  units,  if  they  are 
metaphysically  related  then  it’s  possible  to  convert  one  to  the  other  then  do  the  comparison. 

The  importance  of  establishing  relationships  based  on  value  comparisons  is  that  all  possible 
relationships  do  not  have  to  be  anticipated  when  the  data  model  is  constructed.  The  relationships 
can  be  created  on  the  fly  by  a  user  with  knowledge  of  the  meta-data  of  the  model.  As  the  values 
change,  the  relationships  can  change. 


implicit  and  explicit  relatio.nsh.ips 

In  relational  database  there  are  no  pointers.  There  are  just  independent  tables  with  values.  If  an 
employees’  table  and  department  table  exist,  every  employee  could  have  a  department  number  as 
one  of  their  columns,  among  other  columns.  And  in  the  department  table,  of  course,  there  ia  a 
column  for  department  number,  department  name,  etc.  Technically  there  are  no  explicit 
connections  between  the  two  tables.  But  there  are  many  implicit  relationships  and  it’s  up  to  the 
queries  to  activate  those  implicit  relationships. 

For  example,  the  fact  that  someone  is  wearing  blue  pants  and  blue  shirt  has  an  implicit  relationship 
-  a  similarity  in  the  color  between  the  two  garments.  It’s  not  explicit,  it’s  implicit  and  humans 
recognize  it  unconsciously.  A  passer  by  doesn’t  have  to  anticipate  it.  Similarly,  die  strength  of  the 
relational  model  is  grounded  in  the  fact  that  the  relational  model  of  data  is  value  based.  Many 
implicit  relationships  can  be  captured  and  later  extracted  later  by  a  user  looking  to  discover  them 
without  explicitly  programming  them.  The  programmer  is  freed  from  having  to  anticipate  every 
possibility  and  programming  for  every  contingency. 

Consider  the  example  shown  in  Figure  11.  There  is  an  implicit  relationship  between  the 
department  column  in  the  employees  table  that  has  the  department  number  =  1  and  a  row  in  the 
department  table  that  has  department  number  =1.  At  some  future  time,  a  user  could  bring  this 
relationship  out  and  make  it  explicit  by  comparing  these  values  to  find  all  employees  who  work  in 
Department  1. 


Employees  Relational  Table 

Emp  Num 

Department  # 

_ Salary _ 

2001 

1 

Departments  Relational  Table 

- Depart ragUL# _ Department  Name 

1_ R&D 


Figure  11  -  Value  Based  Inference 


Value  based  inference 

In  a  different  example  consider  one  table  that  has  salaried 

employees  with  their  salaries,  another  table  that  has  the  sales  force  with  their  end-of-ycar 
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commissions.  These  two  columns  (salaries  and  total  year  end  commission)  happen  to  be  of  the 
same  data  type.  They  are  both  measured  by  the  same  amounts,  the  same  units,  i.e.  dollars. 
Someone  could  come  in  (the  database  designer  never  having  anticipated  this)  and  by  comparing  the 
values  in  their  respective  columns  determine  who  is  making  more  money,  the  sales  people  on 
commissions  or  the  salaried  people  including  our  chief  executive  officer.  This  is  illustrated  by 
Figure  12. 


Salaried  Employees  Relational  Table 

Salary  .  .  . . 

Sales  Force  Relational  Table 

Figure  12  -  More  Value  Based  Inference 


By  doing  a  relational  join  operation,  one  could  make  explicit  the  relationship  between  salary  and 
commission.  They  might  do  a  theta  join  looking  for  values  related  by  inequalities,  such  as  "Show 
me  every  salesman  that  earns  more  on  commissions  totally  than  our  directors  do  based  on  their 
salary".  That  person  is  making  explicit  a  relationship  that  heretofore  had  been  implicit  and  only 
had  been  values  with  no  real  physical  connection. 


Value  Based  Semantic  Networks 

The  authors  have  synthesized  value  based  relational  models  with  semantic  networks  to  create  a 
value  based  semantic  networks  This  synthesis  takes  the  best  of  the  relational  model  and  combines 
it  with  the  best  of  semantic  networks.  This  work  grew  out  of  the  authors’  observation  that 
relationships  and  attributes  are  second  class  citizens  in  semantic  networks,  but  there  was  a  lot  of 
good  in  them  that  could  be  made  better  by  abstracting  the  value  based  approach  from  the  relational 
model  and  using  it  to  reshape  semantic  nets.  This  new  class  of  model  has  been  named  Value 
Based  Semantic  Networks.  These  networks  do  not  have  links.  They  have  nodes  with  values  in 
them  that  express  implicit  relationships  in  those  values  and  it’s  up  to  the  user  or  a  computer 
program  to  make  certain  relationships  explicit.  The  advantage  is  that  they  place  properties, 
relationships  and  events  on  equal  terms  with  entities  and  types.42 


Value  based  semantic  networks  example 

Consider  a  room  with  a  floor  and  ceiling  as  an  example  of  the  generic  relationship  of  one  entity 
located  above  a  second  entity  -  the  above/below  relationship.  In  the  value  based  semantic  network, 
a  formalism  is  created  that  can  model  the  hierarchy  of  relationships.  Just  as  entity  types,  such  as 
royal  elephants,  elephants  and  grey  things,  were  modeled,  relationships  are  given  full  status  in  the 
model  as  nodes.  The  relationships  can  then  inherit  properties  in  the  same  manner  as  entities.  The 
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specific  floor  beneath  our  feet  and  the  specific  ceiling  above  our  heads  is  a  specific  instance  of  tlic 
generic  above/below  relationship  just  as  this  specific  floor  is  an  instance  of  the  generic  entity  type 
floor.45 

Figure  13  shows  a  value  based  semantic  network.  Starting  at  the  bottom  of  the  figure,  a  node  is  a 
specific  reference  to  a  specific  floor.  The  term  in  Figure  13  of  specific  floor  is  not  necessarily  in 
the  node.  It  could  simply  be  a  reference  to  a  synonym  list  elsewhere  in  the  database  that  handles 
vocabulary. 

Borrowing  a  concept  from  the  relational  model,  a  surrogate  number  for  identification  will  be 
assigned  to  each  of  the  entities  and  relationships  in  Figure  13.  This  number  is  assigned  by  the 
system  and  is  completely  unique.  It  is  often  referred  to  as  a  surrogate  key.  The  users  never  deal 
with  them,  never  change  them,  there’s  no  referential  integrity  problem.  These  surrogate  keys  will 
be  used  throughout  this  example. 


9  Conceptual  Unit  Relationships  (8,3) 

8.  Concept  -  General  Above/Below  Relationship  (5,4) 

7.  Conseptual  Unit  Relationship  (5,2) 

6.  Conceptual  Unit  Relationship  (4,1) 

5.  Concept  -  General  Ceiling 
4.  Concept  -  Generak  Floor 

3.  Specific  Above/Below  Relationship  (2,1) 

2.  Specific  Ceiling 
1 .  Specific  Floor 


Figure  13  -  Value  Based  Semantic  Networks 


The  specific  floor  in  the  model  is  1  and  the  specific  ceiling  is  2.  There  is  also  a  specific 
relationship  between  the  two.  It’s  the  specific  above/below  relationship,  3.  It  has  two  other  values 
to  designate  the  entities  related  by  the  above/below  relationship.  This  is  a  specific  instance  of  an 
above/below  relationship.  And  depending  on  how  you  read  it,  either  1  is  above  2  or  2  is  below  1. 

Notice  these  nodes  are  not  physically  linked.  But  if  one  wanted  to  join  them  one  could  find  out 
that  our  specific  floor  is  in  fact  below  our  specific  ceiling.  Because  there  is  a  specific  above/below 
relationship,  the  model  will  require  a  general  above/below  relationship,  8.  Node  4  is  a  node  for 
the  general  concept  of  floor  and  the  general  concept  of  ceiling  is  5. 

A  very  special  recursive  relationship,  called  a  conceptual  unit,  creates  a  relationship  between  a 
concept  object  and  an  instance  of  the  object.  (1286,  1291)  This  special  relationship  replaces  the 
links  of  a  semantic  network.  The  two  values  in  the  relationship  designate  the  "logical  links" 
between  the  concept  object  and  the  instance  of  that  object.  As  shown  in  Figure  13,  conceptual  unit 
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7  relates  the  general  ceiling  5  to  the  instance  of  specific  ceiling  2.  Similarly,  conceptual  unit  9 
relates  the  concept  of  above/below  8  to  the  specific  instance  of  above/below  relationship  3.** 


Figure  14  shows  the  problem  with  attempting  to  model  this  example  with  normal  semantic 
networks.  They  cannot  handle  the  inheritance  of  relationships  because  relationships  are  handled  by 
links  which  may  not  be  hierarchically  classified  in  the  network. 


Figure  14  -  Link  Based  Semantic  Networks 


We  want  representational  formalisms  that  give  us  as  much  expressive  power  for  dealing  with 
relationships  as  they  provide  for  handling  entities.  A  relationship  that  you  could  make  an  instance 
or  a  superclass  or  a  subclass  and  properties  can  be  inherited.  You  can  have  very  complex 
relationships  involving  many  things  at  one  given  time.  This  formalism  allows  for  this  complexity 
while  allowing  exceptions  and  taking  advantage  of  the  value  basis. 

To  implement  this  model  in  a  relational  database  system,  all  nodes  would  be  rows  in  various 
tables.  The  system  would  not  be  a  very  efficient  system  doing  that  because  there  would  be 
numerous  join  operations.  Unless  you  have  the  right  hardware,  that  is. 


Hardware  Considerations 

The  purpose  of  this  study  was  to  discover  how  to  achieve  certain  objectives  of  human  database 
interaction  such  as  adaptivity  and  flexibility  in  order  to  avoid  query  failure.  To  do  that  a 
sufficiently  strong  representational  formalism  was  needed  in  the  database  structure.  Now,  taking  a 
further  step,  it  should  be  noted  that  to  do  value  based  semantic  networks  efficiently,  a  special 
purpose  hardware  platform  is  needed  to  support  these  very  expressive  flexible  formalisms.45 
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A  massively  parallel  processor  is  needed  where  each  individual  processor  can  have  one  or  more  of 
these  value  based  node  units  and  all  of  them  can  communicate  simultaneously  with  one  another  to 
make  their  implicit  relationships  explicit.  For  example,  a  computer  like  The  Connection  Machine 
from  Thinking  Machines  Corporation  is  needed.  Additionally,  value  based  relationships  are  better 
suited  for  execution  of  queries  in  parallel  than  relationships  represented  with  a  link-based 
formalism  (e.g..  Find  all  sons  who  are  hated  by  their  fathers.)44 
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7.  INCOMPLETE  AND  UNCERTAIN  INFORMATION 


There  is  one  kind  of  representational  problem  that  the  preceding  formalism  is  not  going  to  solve. 
It’s  probably  the  most  difficult  problem  of  database  management  modeling,  and  therefore  human  - 
database  interaction,  that  there  is.  That’s  the  area  of  incomplete  information  and  uncertain 
information.  It’s  a  lot  deeper  than  can  be  handled  with  null  values  for  missing  information  or  a 
certainty  threshold  like  in  an  expert  system.  Because  not  only  can  data  be  uncertain,  structure  can 
be  uncertain  and  above  all  the  domain  can  be  uncertain. 

At  some  point  in  time  database  management  is  going  to  face  a  crisis  similar  to  what  physics  faced 
in  the  late  1800’s  and  early  I900’s  -  where  there  are  domains  which  can  describe  some  of  the  gross 
superficial  behavior  but  cannot  accurately  divide  it  into  components  and  how  they  relate,  it  cannot 
be  particularized.  Even  the  latest  technology,  object  oriented  programming  is  taking  a  particle 
view  of  data.  This  approach  of  value  based  semantic  networks  is  taking  a  particle  based  approach. 
It’s  not  the  ultimate  solution  and  if  you  wonder  how  the  quantum  domain  has  any  relationship  with 
anything  you  normally  deal  with,  think  about  the  problem  of  corporate  information  managers  and 
IRM  (Information  Resource  Management.)  How  does  the  corporation  take  the  aggregate  of  business 
functions  and  understand  it  as  a  collection  of  individual  discrete  components  that  interact  with 
each  other  and  work  together9  The  boundaries  are  fuzzy  and  foggy.47 
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8.  THE  NEW  INTERFACE  PARADIGM 


A  natural  language  interface,  even  under  the  best  possible  conditions,  would  not  give  a  human- 
database  interaction  as  effective  as  human-human  interaction.  Because  human-human  interaction 
isn’t  just  verbal;  it’s  verbal  and  visual.  It  relates  on  a  physical  context,  not  just  a  verbal  context. 
What  that  means  is  if  human-database  interaction  is  to  be  as  effective  as  human- human,  or  better 
yet,  more  effective,  it  can’t  be  constrained  by  the  verbal  medium  or,  even  worse  than  the  verbal 
medium,  the  textual  medium.  The  interface  needs  to  be  opened  up  to  all  human  senses,  to  an 
experiential  meeting  with  the  database.  In  the  following  sections,  this  study  will  offer  approaches 
to  designing  a  new  interface  paradigm  based  on  value  based  semantic  networks  in  order  to  make 
use  of  full  experiential  interaction  with  the  database. 


Human  Beings  &  Information  Processing 

Human  beings  process  information  on  three  simultaneous  levels:  the  sensorial  level,  the  perceptual 
level,  and  the  conceptual  level.  To  some  extent  man  operates  on  the  sensorial  level.  He  is  not 
generally  aware  of  sensations  all  by  themselves  but  he  can  be  under  suitable  conditions  such  as 
psychological  experiments  involving  the  senses.  He  is  aware  of  one  sensation  as  opposed  to  the 
nothingness  of  a  following  sensation  under  the  right  conditions.  But  even  without  higher  brain 
functioning,  the  lower  brain  can  take  sensations  and  abstract  (in  other  words,  make  explicit) 
relationships  that  are  in  them.  And  that  is  a  perceptual  level  which  gives  rise  to  perceptions  as 
opposed  to  simple  sensations.  Notice  that  animals  (although  they  are  not  intelligent  in  an  abstract 
sense)  are  extremely  good  at  doing  perceptual  tasks.  A  driver  wouldn’t  let  go  of  the  steering  wheel 
in  a  car  driving  through  a  forest  but  he  can  let  the  reigns  loose  with  a  horse.  The  horse  is  not 
going  to  run  into  a  tree.4* 

Beyond  that,  man  is  also  able  to  abstract  from  his  perceptions  and  form  his  concepts.  His 
conceptual  apparatus.  Now  despite  what  Immanuel  Kant  has  said,  this  conceptual  chain  is  how  he 
gets  around  in  the  world  and  how  he  thinks.  Despite  what  a  lot  of  philosophers  and  psychologists 
would  say,  man  is  not  operating  purely  conceptually.  It’s  not  like  he  starts  as  a  baby  (sensorial), 
moves  up  as  a  youngster  (perceptual)  and  ends  up  living  only  on  the  conceptual  plane  as  an  adult. 
Man  in  his  world  operates  in  all  three  planes  simultaneously.  This  is  the  Conceptual  Chain. 

As  a  gedanken  experiment,  try  to  think  of  the  concept  elephant  without  thinking  about  some  part 
of  its  perceptual  appearance.  Try  to  think  of  a  perception  without  a  concept.  It’s  all  interlinked 
together.  For  this  reason,  there  are  two  important  things  to  consider. 

The  first  is  that  humans  need  to  be  able  to  interact  with  databases  on  all  levels.  To  the  database, 
the  sensorial  level  is  the  area  of  measurement  and  data  values.  The  conceptual  level  in  relational 
databases  is  the  database  schema  and  meta-data.  Right  now  there  is  no  perceptual  level,  it’s 
stripped  away  because  designs  have  gone  from  the  real  world  to  data  and  then  to  meta-data.  The 
connecting  medium  has  been  thrown  out.  It  would  be  beneficial  if  humans  could  interact  with  the 
database  on  all  levels.  That  means,  for  example,  if  a  user  wants  to  make  a  reference  to  something 
he  shouldn’t  have  to  limit  his  references  to  terms,  he  might  use  perceptions. 

The  second  is,  it  might  help  to  architect  database  systems  in  senses  that  emulate  human  senses. 
This  would  require  the  integration  of  disparate  hardware  technologies  such  as  vision.  All  of  man’s 
high  level  concepts  are  ultimately  related  to  measurements  taken  in  reality.  There  is  hardware  for 
taking  direct  measurements  in  reality  and  assembling  them  and  making  sense  of  them.  Neural 
networks  and  genetic  algorithms  processing  signals  from  vision  or  aural  devices  might  be  used  for 
the  sensorial  and  perceptual  levels  of  the  database.45 
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By  utilizing  the  conceptual  chain,  researchers  have  a  way  of  tying  in  vision  and  visual/graphical 
drawings,  pictures,  etc.  into  the  knowledge  network  that’s  represented  in  the  value  based  semantic 
networks.  If  a  person  needs  to  be  able  to  do  database  query,  he  should  be  able  to  use  not  just 
natural  language  but  be  able  to  point  to  a  picture  with  the  picture  having  meaning  in  the  database. 
The  value  based  semantic  network  shows  how  item  are  related  to  other  nodes  in  the  network.  In 
the  case  of  the  ceiling  and  floor  example,  the  specific  ceiling  has  certain  characteristic  properties 
which  can  be  measured  and  their  values  put  in  the  database  but  the  images  could  be  there  as 
well.50 

If  someone  said  in  a  query: 


I  want  to  know  what  is  below  that,  (pointing  at  a  picture  that  is  on  their  screen) 


the  system  could  track  it  back  to  its  specific  relationship,  find  out  what  other  things  are,  find  out 
that  happens  to  be  a  specific  ceiling  that  they  pointed  to,  find  out  what  things  are  below  that 
specific  ceiling  and  either  tell  the  user  or  bring  up  pictures.5l 

Self  organization  with  the  database  autonomously  accepting  information  and  organizing  it  is  not 
necessary  at  this  stage.  This  would  require  some  kind  of  neural  network  or  adaptive  algorithm. 
The  system  would  still  have  people  putting  the  information  in  but  in  a  much  friendlier 
environment,  much  more  expressive,  and  at  the  a  perceptual  level  of  object  description.  Users  are 
not  dealing  with  data  anymore,  they  are  dealing  with  the  world  again.52 


Leveraging  the  Human  Synergy 

Consider  one  more  area  of  opportunity.  Man  operates  on  three  levels  including  sensations, 
perceptions  and  conceptions.  He  is  not  just  a  perceiver  and  receiver.  He  also  participates  in  his 
environment.  He  is  extremely  good  at  fine  motor  control.  Excellent  hands  and  feet  and  legs  for 
doing  things.  A  good  case  in  point  is  an  operator  of  a  tractor  or  construction  equipment.  They 
use  most  parts  of  their  body  in  operating  their  device.  Right  now  our  database  system  users  are 
using  their  fingers  on  a  keyboard  and  maybe  one  hand  operating  a  mouse.  Or  maybe  a  finger  on  a 
touch  screen.  However,  if  the  effectiveness  of  human-database  interaction  is  to  be  really  improved, 
interfaces  which  utilize  all  the  interface  options  that  a  human  being  offers  must  be  created.55 


The  Road  in  the  Sky 

An  example  of  one  situation  that  takes  advantage  of  man’s  conceptual,  perceptual,  and  motor 
control  is  the  "Road  in  the  Sky”  example.  Consider  the  typical  fighter  plane  where  the  pilot  has  to 
decipher  many  gauge  readouts,  make  computations  based  on  these,  and  look  at  various  devices  that 
determine  how  far  he  can  go,  where  he  can  go  based  on  wind  currents,  how  much  fuel  he  has  left, 
where  there  might  be  enemy  territory,  where  he  has  to  fly  to  avoid  radar,  etc.  Why  should  he 
have  to  decipher  all  of  this  when  instead  the  computer  onboard  the  plane  can  take  these,  make  the 
calculations,  and  project  on  the  canopy  in  front  of  him,  in  the  sky,  a  road  with  representations  of 
enemy  areas,  etc.,  showing  him  where  to  fly.  He  can  now  basically  fly  by  a  computerized  seat  of 
the  pants  much  as  the  aces  during  WWI  would,  but  under  much  more  compelling  situations  than 
WWI  fighters  ever  encountered. 
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Database  Access 


Of  course,  the  subject  here  is  not  pilots  accessing  databases,  but  users  accessing  databases  should 
be  able  to  do  something  more  like  the  road  in  the  sky.  They  should  not  have  to  deal  with  data  but 
with  the  domain  of  the  world  that  they  are  investigating.  Not  dealing  with  a  model  of  data  but 
with  a  high  level  model  of  what  they  are  interested  in,  and  if  that  particular  domain  makes  use  of 
sight,  sound,  touch,  and  smell  -  utilize  all  those  senses. 

Hardware  technology  is  evolving  rapidly  in  the  areas  of  transceivers  and  transducers.  There  is 
currently  an  interface  using  a  glove  that  an  operator  wears,  it  has  detectors  to  sense  the  wearer's 
hand  positions  and  arm  movements  and  reproduce  these  on  a  screen  in  front  of  him  to  the  extent 
that  now  with  his  glove  hand  lie  can  reach  out  and  pick  up  objects  that  are  actually  on  the  screen. 
Not  in  realitv.  It’s  not  quite  the  Holodeck  on  the  Enterprise  in  Star  Trek:  The  Next  Generation, 
but  it’s  heading  in  that  general  direction.  The  important  thing  to  say  is  that  to  support  that,  once 
again,  we  need  the  representational  formalism  underneath  that  we’ve  been  discussing  and  we  need 
the  hardware  support  under  it  to  make  it  fast  enough  and  give  us  the  performance.54 


Page  -37- 


9.  THE  NEXT  STEP 


The  analysis  and  synthesis  of  this  study  has  produced  promising  directions.  In  this  section  a  future 
development  model  will  be  discussed  that  provides  directions  for  future  research  in  database 
management  techniques.  As  shown  in  Figure  15  there  are  three  different  directions  that  need  to  be 
pursued  based  on  the  analysis  of  this  study.  Like  previous  eras  of  database  evolution,  there  are 
two  increments  of  development:  an  initial  increment  of  new  concepts  and  a  subsequent  increment 
of  cross  fertilization  and  maturation.55 


RESEARCH  &  DEVELOPMENT  MATRIX 
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1 
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Interface 

Figure  15  -  Research  and  Development  Matrix 


The  first  direction  implies  value  based  semantic  networks  are  not  implemented  right  now.  Current 
technology  is  fully  relational  database  management  systems  and  fairly  sophisticated  on-line 
information  storage  and  retrieval  systems.  The  next  step  should  take  them  and  build  on  them  as 
much  as  possible.  In  other  words  -  take  the  best  existing  foundations  and  build  on  them  in 
promising  directions. 

The  second  direction  implies  the  best  of  the  existing  foundations  has  many  weaknesses  and 
ultimately  will  be  insufficient  to  achieve  fully  integrated  interface  goals.  Researchers  need  to  start 
work  on  constructing  a  more  solid  foundation.  A  suggested  model  for  this  direction  is  the  value 
based  semantic  network. 

The  third  direction  is  a  new  interface  paradigm.  Up  to  now  -  with  database  -  all  users  are 
interfacing  with  data  through  data  models.  An  interface  paradigm  is  needed  at  a  higher  level.  It  is 
not  the  information  level  as  in  the  data,  information,  knowledge,  wisdom  progression;  but  it  is  at  a 
higher  level. 
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Adaptive,  Active  Thesauri 

Direction  1  in  Figure  15  is  split  into  two  halves.  One  is  focusing  on  structured  database 
management  systems,  the  other  is  focusing  on  unstructured  information  storage  and  retrieval 
systems.  The  goal  in  increment  1  is  to  make  these  systems  more  adaptive  and  to  prevent  or  offset 
query  failure.  In  the  process  to  make  them  more  flexible  and  make  it  easier  for  developers  and 
users  to  use  them.  Ultimately,  the  data  and  meta-data  dichotomy  must  be  synthesized. 

How  can  this  be  accomplished?  The  best  foundation  is  the  concept  of  a  thesaurus.  A  thesaurus,  at 
its  best,  is  a  network  -  not  just  a  tree,  but  a  directed  acyclic  graph  taking  terms  or  concepts  and 
relating  them  to  each  other  through  narrower  term,  wider  term  and  related  terms.  Ultimately,  the 
thesauri  can  have  one  concept  with  multiple  terms  attached  to  it.  And  take  one  term  and  relate  it 
to  multiple  concepts.  So  far  all  the  current  thesauri  operate  manually.  For  example,  suppose  a 
medical  researcher  is  investigating  the  term  bronchitis.  If  he  tried  a  query  with  bronchitis  but 
didn’t  get  what  he  wanted,  he  could  look  it  up  in  the  thesaurus  and  say  "I  need  something  more 
general".  Go  up  the  wider  term  link  and  find  something  more  general  and  substitute  that  into  the 
query.  Or  maybe  he  got  too  many  responses  to  the  query.  He  could  go  down  a  narrower  term 
link* 

The  thesaurus  could  be  automated,  as  has  been  shown  in  Proto-Atlas,  by  implementing  a  value 
based  semantic  network.  Proto-Atlas  offsets  query  failure,  is  flexible,  and  doesn’t  require  users  to 
specify  or  have  knowledge  of  meta-data.  It  doesn’t  force  the  user  to  know  exactly  what  they  arc 
after  because  if  they  did  know,  they  wouldn't  be  using  the  database.57 


Adaptive  Relational  Database 

In  the  area  of  database  management  systems,  the  strongest  foundation  available  is  relational 
database  management  systems.  Near  fully  relational  databases  are  currently  available  such  as: 
IBM’s  DB2,  release  2;  Sybase;  Oracle;  etc.  These  database  managers  meet  most  if  not  all  of 
Codd’s  rules  for  relational  databases,  supporting  concepts  such  as  referential  integrity  and  views. 
Within  a  year  of  today,  there  will  likely  be  quite  a  few  products  available  which  are  fully 
relational.  The  need  is  to  encapsulate  such  a  database  system  with  an  adaptive  layer.5* 

How  can  the  adaptive  layer  on  top  of  relational  databases  be  accomplished?  One  rich  collection  of 
methods  is  object-oriented  programming  paradigms.  The?'*  seem  to  be  the  best  available  methods 
at  the  moment  because  of  the  code  reusability  they  encourage  -  being  able  to  take  objects  (self- 
contained  packages  of  program  code)  and  allow  other  people  to  use  them  and  embed  them  into 
other  applications.  Clever  programmers  take  them  and  make  more  specific  uses  of  them  other  than 
the  original  one  for  which  they  were  intended. 


Using  today’s  distributed  architecture 

Consider  the  relational  database  and  the  example  of  the  employee  table.  Assume  it  is  running  on  a 
general  purpose  machine  running  a  database  management  system  such  as  DB2.  Assume  also  a 
front  end  interface  machine  such  as  a  386  workstations  like  IBM  PS2  model  80s  or  Motorola 
68000  based  workstations  such  as  Sun  or  Macintosh  running  interface  programs  at  the  user 
location.  Smalltalk-80,  an  object  oriented  programing  environment,  could  be  used  to  provide  much 
of  the  functionality  described  in  this  study  for  the  front-end  interface.  With  DB2  running  on  the 
backend  and  efficient  network  hardware  and  software  for  connecting  the  two,  an  environment  exists 
which  is  exploitable  in  today’s  world.5* 

An  object  oriented  environment  can  have  hierarchies.  A  window  object  can  be  created  with  a 
subclass  of  a  data  entry  form  which  contains  a  subclass  of  an  entry  form  for  employees.  Someone 
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programs  the  code  for  a  general  purpose  window  object  and  from  then  on  it  exists  in  an  object 
library  for  all  to  use.  Someone  else  says  "I  need  a  data  entry  form  and  it  needs  to  be  a  window." 
They  take  the  window  object  and  create  a  subtype  of  it.  They  add  more  code  to  it  Then 
someone  says  "I  really  need  to  maintain  equipment  data”,  they  take  that  data  entry  window  class 
and  create  another  subtype  -  one  for  entering  equipment  data.  When  an  application  demands  one 
of  these  an  instance  of  the  object  is  made  to  support  the  application.40 

Notice  the  specific  screen  is  not  rewritten  from  scratch.  Existing  code  is  used  from  the  library  it  is 
expanded  to  the  new  requirement  to  get  the  specific  functionality.  Because  the  employee  data 
object  is  lower  down  the  inheritance  hierarchy,  it  inherits  behavior  from  the  window  class  higher 
up.  The  objects  are  not  only  inheriting  their  properties,  as  in  a  semantic  network,  they  are  also 
inheriting  their  code. 


Exploiting  the  object-oriented  paradigm 

The  object-oriented  paradigm  can  be  exploited  to  communicate  with  the  database.  Suppose  a  table 
object  is  created.  The  table  object  has  embedded  in  its  programming  code,  whatever  calls  it  needs 
to  make  queries  and  send  commands  to  the  database  system  over  the  network  and  to  format  data 
which  returns.  But  to  the  programming  environment,  Smalltalk  for  instance,  it  doesn’t  look  any 
different  than  the  actual  table.  This  creates  a  virtual  database  table  in  the  Smalltalk  object  oriented 
environment.  Other  things  can  be  done  with  that  table  object.  But  the  paradigm  can  create 
something  more  sophisticated.  If  the  underlying  database  is  not  fully  relational,  perhaps  it  doesn’t 
support  referential  integrity  or  domains,  a  special  relation  object  could  be  created  which  looks  like 
a  table,  refers  to  the  table  object  and  adds  functionality  to  it.  In  this  way  the  table  object  has  been 
used  to  create  a  higher  level  object  that  relates  to  it  and  adds  functionality.61 

The  process  of  building  objects  can  continue  to  higher  levels.  A  special  purpose  relation  could  be 
added.  For  example,  suppose  the  application  requires  the  tracking  of  dates  and  times,  to  add 
temporality.  Temporality  is  not  a  basic  part  of  the  relational  model.  But  create  a  class  of  objects 
can  be  created  that  have  temporality  as  a  basic  feature.  If  you  need  a  special  table  that’s  temporal, 
you  make  an  instance  of  this  class,  it  will  make  instances  of  the  relation  object,  table  object  and 
define  the  SQL  queries  on  the  back-end  database  machine.  The  interface  sends  messages  in  the 
object  oriented  environment  to  the  special  purpose  relations.62 

The  next  logical  step  is  a  composite  object.  A  composite  object  is  an  special  purpose  relation 
object  with  a  small  expen  system  that  does  disambiguating.  The  composite  resembles  a  view  but 
resolves  view  update  problems.  It’s  updatable  and  deletable.  The  expert  system  has  special  rules 
in  it  for  disambiguating  as  desired.  The  rules  could  be  stored  in  a  special  dictionary  in  the  back¬ 
end  database  machine  and  made  callable  from  the  front-end  object-oriented  interface.  This 
composite  object  can  contain  the  adaptivity  code.  The  code  that  -  if  you  put  a  query  to  this 
composite  and  something  is  spelled  wrong  or  you  unknowingly  violate  real  world  constraints  it  can 
handle  it  with  conceptual  pattern  processing,  disambiguation,  spelling  checking  or  whatever  is 
required.61 

Taking  the  analysis  a  little  further,  a  perspective  object  can  be  created.  A  perspective  object  can 
reference  the  composite  object  but  adds  a  visual  appearance  (format)  to  it.  One  subclass  of  a 
perspective  might  present  a  visual  in  an  Excell  or  Lotus  spreadsheet.  A  different  perspective  object 
might  produce  a  printed  report  appearance  And  the  advantage  here  is  that  a  database  designer 
could  create  a  number  of  different  composites,  a  number  of  different  special  purpose  relations,  a 
number  of  different  perspectives  and  we  now  have  a  whole  different  methodology  of  designing 
databases.  Instead  of  developing  database  applications  from  scratch  they  say  "I  have  a  new 
application  and  it  requires  invoices,  do  I  create  a  new  screen  form  from  scratch?  No.  I  have  the 
underlying  tables  in  the  database.  I  pull  out  composite  classes  that  have  header  and  line  items.  I 
might  change  a  few  things  in  its  code  when  I  make  an  instance  of  it  for  the  specific  application.  I 
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pull  out  an  existing  perspective,  1  have  it."  The  programmer  is  not  generating  new  sniff,  he  is 
taking  old  stuff  and  putting  it  together  in  a  somewhat  new  way.  An  altogether  different  method  of 
designing,  developing,  and  coding.64 


Value  Based  Networks  and  Visual  Interface 

During  this  initial  increment,  something  must  be  done  about  direction  2:  research  and  implement 
value  based  semantic  networks.  Design  new  code  and  implement  them  on  a  small  scale.  Then,  as 
increment  1  nears  finishing,  the  experience  with  adaptivity  can  be  utilized  and  placed  on  top  of  a 
stronger  foundation.  Flexibility  can  then  be  added  to  permit  various  kinds  of  references, 
metaphors,  analogies,  and  examples.64 

There’s  also  the  issue  in  the  new  interface  paradigm  which  requires  exploitation  of  all  of  man’s 
strengths.  An  attempt  to  move  human-  database  communication  to  be  better  than  even  human- 
human  communication,  i.e.  better  than  natural  language.  If  the  visual  aspect  can  be  added,  a  step 
will  be  taken  toward  that  goal.  Instead  of  a  textual  thesaurus  why  not  have  a  visual  thesaurus  of 
nodes  and  links.  A  system  for  editing,  querying  and  manipulating  active  thesauri.  If  you  have  a 
number  of  terms  you  want  to  explore  you  can  mouse  on  all  of  diem,  make  them  active,  ask  the 
network  to  process  and  if  they  happen  to  be  incompatible  and  cause  query  failure,  the  system 
comes  back  using  conceptual  pattern  processing  and  shows  you  what  the  remaining  trade-offs  are 
visually,  rather  than  entering  text  and  reading  it.  The  system  is  now  guiding  the  user  through 
moving  from  a  state  of  little  knowledge  to  a  state  of  more  knowledge  as  it  should.66 


The  Fourth  Database  Era  -  The  Era  of  General  Modeling 

In  increment  2,  cross-leveraging  and  cross-fertilizing  can  start.  The  adaptive  object-oriented  system 
can  be  placed  on  top  of  the  value  based  semantic  network.  New  features  can  be  added  to  it: 
exceptions  handling,  additional  flexibility,  and  more  visuals. 

What  is  the  result  of  direction  1,  increment  2,  when  there  is  a  well  developed  adaptive  layer  on  a 
relational  database  management  system?  The  authors  it  is  a  new  modeling  formalism.  SQL  and 
data  sublanguages  do  not  spring  out  of  nowhere.  A  language  is  not  a  primary,  it’s  a  secondary. 
The  primary  is  a  modeling  formalism.  Codd  created  relational  tables  with  rows  and  columns  and 
as  a  consequence  designed  a  language.  The  first  languages,  the  relational  calculus  and  the 
relational  algebra,  later  evolved  into  SQL.  In  a  similar  manner  a  modeling  formalism  is  proposed 
here.  Dealing  not  with  specific  relations  but  with  composites  and  perspectives.  Our  modeling 
formalism  will  eventually  get  extended  and  bring  in  the  value  based  semantic  formalism. 
Obviously,  visual  interface  will  be  a  necessary  pan  of  the  formalism.  This  new  model  can  also 
have  a  command  level  interface  which,  with  its  adaptivity  and  conceptual  orientation,  is  a 
significant  step  beyond  a  data  sublanguage.67 

Users  are  no  longer  dealing  with  a  data  sublanguage,  they  are  dealing  with  something  higher:  an 
information  sublanguage  that  is  a  modeling  language  designed  for  higher  and/or  more  appropriate 
levels  of  modeling.  Whatever  method  is  created  for  interfacing  visually  with  the  database  is  a 
sublanguage  of  its  own.  It’s  a  visual  language  for  dealing  with  the  database  models.  This  paper 
has  proposed  a  different  methodology  based  on  dealing  with  object-oriented  composites  and  reusing 
existing  composites  to  form  a  value  based  semantic  network.  This  approach  uses  special  relation 
objects  containing  expert  systems  to  provide  an  adaptive,  flexible  user  interface.  A  significant 
advantage  of  this  approach  is  the  reusability  of  code  written  in  the  object-oriented  languages. 
Object-oriented  code  can  easily  be  modified  to  create  new  objects.  The  more  code  is  developed 
for  reuse  and  cross  leveraged  the  more  that  can  be  accomplished. 
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This  paper  has  shown  some  excellent  areas  for  continued  research  in  database  management 
theories.  Projecting  these  areas  into  the  future,  the  time  line  of  database  evolution  looks  like  this:- 


Database  Eras 


Prehistory 

Time  Line 


Figure  16  -  Projected  Database  Evolution 
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